Zipf’s Law for Colombian City Sizes in 2025

June 10, 2025

(Español: La ley de Zipf para las ciudades de Colombia en 2025)

Zipf’s Law for populations says that city populations decrease according to a simple power law: $$ P(r) = \frac{C}{r^{\alpha}} $$ Here, $P(r)$ is the population of the $r$-th largest city in a given region, and $C$ and $\alpha$ are fitted constants.

More concretely, this says that the population of the largest city (rank $r=1$) is approximated by $P(1)=C/1^\alpha=C$, the population of the second largest city (rank $r=2$) is approximated by $P(2)=C/2^\alpha$, the population of the third by $P(3)=C/3^\alpha$, and so on. Data from many countries suggests that $\alpha \approx 1$, but in practice, this number can vary depending on the country or region.

I went looking for a recent plot showing how this works for Colombia and was surprised not to find one online. So, I decided to make one myself! (Always fun to find an excuse to create a nice TikZ figure and learn something new in the process.)

It turns out the power law is a pretty good fit for Colombia’s cities.

(Figure created with TikZ. Source code here.)

One thing that is noticeable in the plot, besides the good fit, is that Bogotá seems to be an outlier – it is noticeably above the fitted curve. This is a well-documented phenomenon: capital cities often attract people in ways other cities do not.

The population numbers come from the official government estimates for 2025, which I got from Wikipedia. You can find the full table there or in the source code for the figure above.

Log-log plot

The power law fit is better checked in a log-log plot. This is because these plots straighten out power functions (such as $P(r)$) into straight lines, and our brains are pretty good at spotting lines. See below.

(Figure created with TikZ. Source code here.)

To get this fit equation, I actually started by taking logs: I did a linear regression of $\log P(r)$ against $\log (r)$ (see the Python code at the end). The fit has an $R^2=0.96$. Very high!

Percentage errors

But sometimes, seeing the actual percentage errors is more convincing than just quoting $R^2$, or looking at the graphs. The percentage errors are surprisingly low (ignoring Bogotá, which we expect to be special, and Soacha, which is really a neighboring city to Bogotá).

City	Rank	Actual Population	Fit Population	Percentage Error (%)
Bogotá	1	7,937,898	5,544,989	30%
Medellín	2	2,634,570	2,924,988	-11%
Cali	3	2,285,099	2,012,034	12%
Barranquilla	4	1,342,818	1,542,935	-15%
Cartagena	5	1,065,881	1,255,809	-18%
Soacha	6	828,947	1,061,350	-28%
Cúcuta	7	815,891	920,626	-13%
Soledad	8	686,339	813,900	-19%
Bucaramanga	9	623,881	730,079	-17%
Villavicencio	10	593,273	662,440	-12%
Valledupar	11	575,225	606,669	-5%
Bello	12	570,329	559,863	2%
Santa Marta	13	566,650	520,002	8%
Ibagué	14	546,003	485,631	11%
Montería	15	531,424	455,678	14%
Pereira	16	482,824	429,333	11%
Manizales	17	459,262	405,975	12%
Pasto	18	415,937	385,117	7%
Neiva	19	388,229	366,375	6%
Palmira	20	359,888	349,438	3%

What this tells us

It is pretty amazing to know that Cartagena should have a population of around 1 million people just because it is the 5th largest city in Colombia, and that Ibagué should have around 500 thousand people, just because it is the 14th largest city in Colombia. All predicted from a simple formula: $$ P(r)=\frac{5544989}{r^{0.92}} $$

$P(r):$ Population in 2025 of the $r$-th largest city in Colombia.

Python code

Here is the Python code I used to find this power-law fit (and make some quick plots). It prints out the fit function and $R^2$. To try it for another country, just swap in new data!

# setup
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import pandas as pd


colombia_cities = [
    ("Bogotá",        7937898),
    ("Medellín",      2634570),
    ("Cali",          2285099),
    ("Barranquilla",  1342818),
    ("Cartagena",      1065881),
    ("Soacha",         828947),
    ("Cúcuta",         815891),
    ("Soledad",        686339),
    ("Bucaramanga",    623881),
    ("Villavicencio",  593273),
    ("Valledupar",     575225),
    ("Bello",          570329),
    ("Santa Marta",    566650),
    ("Ibagué",         546003),
    ("Montería",       531424),
    ("Pereira",        482824),
    ("Manizales",      459262),
    ("Pasto",          415937),
    ("Neiva",          388229),
    ("Palmira",        359888)
]

ranks = np.arange(1, 21)
city_names = [name for name, pop in colombia_cities]
city_pops = np.array([pop for name, pop in colombia_cities])

# Power law fit: log(Pop) = log(a) + b*log(rank)
log_ranks = np.log(ranks)
log_pops = np.log(city_pops)
b, log_a = np.polyfit(log_ranks, log_pops, 1)
a = np.exp(log_a)
fit_pops = a * ranks ** b

# Fit quality
r2 = r2_score(log_pops, log_a + b * log_ranks)

print(f"Power law fit: Population = {a:.0f} × Rank^{b:.2f} (R² = {r2:.2f})")

And here is some quick code to create the plots (not as pretty as the TikZ figures above, but fast to generate):

import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Linear
ax1.scatter(ranks, city_pops, s=60)
ax1.plot(ranks, fit_pops, '--r', label=f"Fit: $y={a:.0f}x^{{{b:.2f}}}$")
ax1.set_xlabel("Rank")
ax1.set_ylabel("Population")
ax1.set_title("Colombia City Populations: Linear Scale")
ax1.set_xticks(ranks)
for i, name in enumerate(city_names):
    ax1.annotate(name, (ranks[i], city_pops[i]), textcoords="offset points", xytext=(0, 7), ha='center', fontsize=9)
ax1.legend()

# Log-log
ax2.scatter(ranks, city_pops, s=60)
ax2.plot(ranks, fit_pops, '--r', label=f"Fit: $y={a:.0f}x^{{{b:.2f}}}$")
ax2.set_xscale("log")
ax2.set_yscale("log")
ax2.set_xlabel("Rank (log)")
ax2.set_ylabel("Population (log)")
ax2.set_title("Colombia City Populations: Log-Log")
for i, name in enumerate(city_names):
    ax2.annotate(name, (ranks[i], city_pops[i]), textcoords="offset points", xytext=(0, 7), ha='center', fontsize=9)
ax2.legend()

plt.tight_layout()
plt.show()

More info

Much has been written about Zipf’s Law, both for languages and for city populations. If interested, consider looking at:

Wikipedia article on Zipf’s Law
An OECD Regional Development Working Paper with lots of interesting graphs and considerations, including aspects about the definition of a city. See references to delve deeper!

Want to get an email when a new post is added? If so, subscribe here.