Zipf’s Law for Colombian City Sizes in 2025

Tags: Zipf's law

(Español: La ley de Zipf para las ciudades de Colombia en 2025)

Zipf’s Law for populations says that city populations decrease according to a simple power law: $$ P(r) = \frac{C}{r^{\alpha}} $$ Here, $P(r)$ is the population of the $r$-th largest city in a given region, and $C$ and $\alpha$ are fitted constants.

More concretely, this says that the population of the largest city (rank $r=1$) is approximated by $P(1)=C/1^\alpha=C$, the population of the second largest city (rank $r=2$) is approximated by $P(2)=C/2^\alpha$, the population of the third by $P(3)=C/3^\alpha$, and so on. Data from many countries suggests that $\alpha \approx 1$, but in practice, this number can vary depending on the country or region.

I went looking for a recent plot showing how this works for Colombia and was surprised not to find one online. So, I decided to make one myself! (Always fun to find an excuse to create a nice TikZ figure and learn something new in the process.)

It turns out the power law is a pretty good fit for Colombia’s cities.

(Figure created with TikZ. Source code here.)

One thing that is noticeable in the plot, besides the good fit, is that Bogotá seems to be an outlier – it is noticeably above the fitted curve. This is a well-documented phenomenon: capital cities often attract people in ways other cities do not.

The population numbers come from the official government estimates for 2025, which I got from Wikipedia. You can find the full table there or in the source code for the figure above.

Log-log plot

The power law fit is better checked in a log-log plot. This is because these plots straighten out power functions (such as $P(r)$) into straight lines, and our brains are pretty good at spotting lines. See below.

(Figure created with TikZ. Source code here.)

To get this fit equation, I actually started by taking logs: I did a linear regression of $\log P(r)$ against $\log (r)$ (see the Python code at the end). The fit has an $R^2=0.96$. Very high!

Percentage errors

But sometimes, seeing the actual percentage errors is more convincing than just quoting $R^2$, or looking at the graphs. The percentage errors are surprisingly low (ignoring Bogotá, which we expect to be special, and Soacha, which is really a neighboring city to Bogotá).

City Rank Actual Population Fit Population Percentage Error (%)
Bogotá 1 7,937,898 5,544,989 30%
Medellín 2 2,634,570 2,924,988 -11%
Cali 3 2,285,099 2,012,034 12%
Barranquilla 4 1,342,818 1,542,935 -15%
Cartagena 5 1,065,881 1,255,809 -18%
Soacha 6 828,947 1,061,350 -28%
Cúcuta 7 815,891 920,626 -13%
Soledad 8 686,339 813,900 -19%
Bucaramanga 9 623,881 730,079 -17%
Villavicencio 10 593,273 662,440 -12%
Valledupar 11 575,225 606,669 -5%
Bello 12 570,329 559,863 2%
Santa Marta 13 566,650 520,002 8%
Ibagué 14 546,003 485,631 11%
Montería 15 531,424 455,678 14%
Pereira 16 482,824 429,333 11%
Manizales 17 459,262 405,975 12%
Pasto 18 415,937 385,117 7%
Neiva 19 388,229 366,375 6%
Palmira 20 359,888 349,438 3%

What this tells us

It is pretty amazing to know that Cartagena should have a population of around 1 million people just because it is the 5th largest city in Colombia, and that Ibagué should have around 500 thousand people, just because it is the 14th largest city in Colombia. All predicted from a simple formula: $$ P(r)=\frac{5544989}{r^{0.92}} $$

$P(r):$ Population in 2025 of the $r$-th largest city in Colombia.

Python code

Here is the Python code I used to find this power-law fit (and make some quick plots). It prints out the fit function and $R^2$. To try it for another country, just swap in new data!

# setup
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import pandas as pd


colombia_cities = [
    ("Bogotá",        7937898),
    ("Medellín",      2634570),
    ("Cali",          2285099),
    ("Barranquilla",  1342818),
    ("Cartagena",      1065881),
    ("Soacha",         828947),
    ("Cúcuta",         815891),
    ("Soledad",        686339),
    ("Bucaramanga",    623881),
    ("Villavicencio",  593273),
    ("Valledupar",     575225),
    ("Bello",          570329),
    ("Santa Marta",    566650),
    ("Ibagué",         546003),
    ("Montería",       531424),
    ("Pereira",        482824),
    ("Manizales",      459262),
    ("Pasto",          415937),
    ("Neiva",          388229),
    ("Palmira",        359888)
]

ranks = np.arange(1, 21)
city_names = [name for name, pop in colombia_cities]
city_pops = np.array([pop for name, pop in colombia_cities])

# Power law fit: log(Pop) = log(a) + b*log(rank)
log_ranks = np.log(ranks)
log_pops = np.log(city_pops)
b, log_a = np.polyfit(log_ranks, log_pops, 1)
a = np.exp(log_a)
fit_pops = a * ranks ** b

# Fit quality
r2 = r2_score(log_pops, log_a + b * log_ranks)

print(f"Power law fit: Population = {a:.0f} × Rank^{b:.2f} (R² = {r2:.2f})")

And here is some quick code to create the plots (not as pretty as the TikZ figures above, but fast to generate):

import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Linear
ax1.scatter(ranks, city_pops, s=60)
ax1.plot(ranks, fit_pops, '--r', label=f"Fit: $y={a:.0f}x^{{{b:.2f}}}$")
ax1.set_xlabel("Rank")
ax1.set_ylabel("Population")
ax1.set_title("Colombia City Populations: Linear Scale")
ax1.set_xticks(ranks)
for i, name in enumerate(city_names):
    ax1.annotate(name, (ranks[i], city_pops[i]), textcoords="offset points", xytext=(0, 7), ha='center', fontsize=9)
ax1.legend()

# Log-log
ax2.scatter(ranks, city_pops, s=60)
ax2.plot(ranks, fit_pops, '--r', label=f"Fit: $y={a:.0f}x^{{{b:.2f}}}$")
ax2.set_xscale("log")
ax2.set_yscale("log")
ax2.set_xlabel("Rank (log)")
ax2.set_ylabel("Population (log)")
ax2.set_title("Colombia City Populations: Log-Log")
for i, name in enumerate(city_names):
    ax2.annotate(name, (ranks[i], city_pops[i]), textcoords="offset points", xytext=(0, 7), ha='center', fontsize=9)
ax2.legend()

plt.tight_layout()
plt.show()

More info

Much has been written about Zipf’s Law, both for languages and for city populations. If interested, consider looking at:

Subscribe

Want to get an email when a new post is added? If so, subscribe here.