Zipf’s Law for Colombian City Sizes in 2025
Tags: Zipf's law
(Español: La ley de Zipf para las ciudades de Colombia en 2025)
Zipf’s Law for populations says that city populations decrease according to a simple power law: $$ P(r) = \frac{C}{r^{\alpha}} $$ Here, $P(r)$ is the population of the $r$-th largest city in a given region, and $C$ and $\alpha$ are fitted constants.
More concretely, this says that the population of the largest city (rank $r=1$) is approximated by $P(1)=C/1^\alpha=C$, the population of the second largest city (rank $r=2$) is approximated by $P(2)=C/2^\alpha$, the population of the third by $P(3)=C/3^\alpha$, and so on. Data from many countries suggests that $\alpha \approx 1$, but in practice, this number can vary depending on the country or region.
I went looking for a recent plot showing how this works for Colombia and was surprised not to find one online. So, I decided to make one myself! (Always fun to find an excuse to create a nice TikZ figure and learn something new in the process.)
It turns out the power law is a pretty good fit for Colombia’s cities.

(Figure created with TikZ. Source code here.)
One thing that is noticeable in the plot, besides the good fit, is that Bogotá seems to be an outlier – it is noticeably above the fitted curve. This is a well-documented phenomenon: capital cities often attract people in ways other cities do not.
The population numbers come from the official government estimates for 2025, which I got from Wikipedia. You can find the full table there or in the source code for the figure above.
Log-log plot
The power law fit is better checked in a log-log plot. This is because these plots straighten out power functions (such as $P(r)$) into straight lines, and our brains are pretty good at spotting lines. See below.

(Figure created with TikZ. Source code here.)
To get this fit equation, I actually started by taking logs: I did a linear regression of $\log P(r)$ against $\log (r)$ (see the Python code at the end). The fit has an $R^2=0.96$. Very high!
Percentage errors
But sometimes, seeing the actual percentage errors is more convincing than just quoting $R^2$, or looking at the graphs. The percentage errors are surprisingly low (ignoring Bogotá, which we expect to be special, and Soacha, which is really a neighboring city to Bogotá).
City | Rank | Actual Population | Fit Population | Percentage Error (%) |
---|---|---|---|---|
Bogotá | 1 | 7,937,898 | 5,544,989 | 30% |
Medellín | 2 | 2,634,570 | 2,924,988 | -11% |
Cali | 3 | 2,285,099 | 2,012,034 | 12% |
Barranquilla | 4 | 1,342,818 | 1,542,935 | -15% |
Cartagena | 5 | 1,065,881 | 1,255,809 | -18% |
Soacha | 6 | 828,947 | 1,061,350 | -28% |
Cúcuta | 7 | 815,891 | 920,626 | -13% |
Soledad | 8 | 686,339 | 813,900 | -19% |
Bucaramanga | 9 | 623,881 | 730,079 | -17% |
Villavicencio | 10 | 593,273 | 662,440 | -12% |
Valledupar | 11 | 575,225 | 606,669 | -5% |
Bello | 12 | 570,329 | 559,863 | 2% |
Santa Marta | 13 | 566,650 | 520,002 | 8% |
Ibagué | 14 | 546,003 | 485,631 | 11% |
Montería | 15 | 531,424 | 455,678 | 14% |
Pereira | 16 | 482,824 | 429,333 | 11% |
Manizales | 17 | 459,262 | 405,975 | 12% |
Pasto | 18 | 415,937 | 385,117 | 7% |
Neiva | 19 | 388,229 | 366,375 | 6% |
Palmira | 20 | 359,888 | 349,438 | 3% |
What this tells us
It is pretty amazing to know that Cartagena should have a population of around 1 million people just because it is the 5th largest city in Colombia, and that Ibagué should have around 500 thousand people, just because it is the 14th largest city in Colombia. All predicted from a simple formula: $$ P(r)=\frac{5544989}{r^{0.92}} $$
$P(r):$ Population in 2025 of the $r$-th largest city in Colombia.
Python code
Here is the Python code I used to find this power-law fit (and make some quick plots). It prints out the fit function and $R^2$. To try it for another country, just swap in new data!
# setup
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import pandas as pd
colombia_cities = [
("Bogotá", 7937898),
("Medellín", 2634570),
("Cali", 2285099),
("Barranquilla", 1342818),
("Cartagena", 1065881),
("Soacha", 828947),
("Cúcuta", 815891),
("Soledad", 686339),
("Bucaramanga", 623881),
("Villavicencio", 593273),
("Valledupar", 575225),
("Bello", 570329),
("Santa Marta", 566650),
("Ibagué", 546003),
("Montería", 531424),
("Pereira", 482824),
("Manizales", 459262),
("Pasto", 415937),
("Neiva", 388229),
("Palmira", 359888)
]
ranks = np.arange(1, 21)
city_names = [name for name, pop in colombia_cities]
city_pops = np.array([pop for name, pop in colombia_cities])
# Power law fit: log(Pop) = log(a) + b*log(rank)
log_ranks = np.log(ranks)
log_pops = np.log(city_pops)
b, log_a = np.polyfit(log_ranks, log_pops, 1)
a = np.exp(log_a)
fit_pops = a * ranks ** b
# Fit quality
r2 = r2_score(log_pops, log_a + b * log_ranks)
print(f"Power law fit: Population = {a:.0f} × Rank^{b:.2f} (R² = {r2:.2f})")
And here is some quick code to create the plots (not as pretty as the TikZ figures above, but fast to generate):
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# Linear
ax1.scatter(ranks, city_pops, s=60)
ax1.plot(ranks, fit_pops, '--r', label=f"Fit: $y={a:.0f}x^{{{b:.2f}}}$")
ax1.set_xlabel("Rank")
ax1.set_ylabel("Population")
ax1.set_title("Colombia City Populations: Linear Scale")
ax1.set_xticks(ranks)
for i, name in enumerate(city_names):
ax1.annotate(name, (ranks[i], city_pops[i]), textcoords="offset points", xytext=(0, 7), ha='center', fontsize=9)
ax1.legend()
# Log-log
ax2.scatter(ranks, city_pops, s=60)
ax2.plot(ranks, fit_pops, '--r', label=f"Fit: $y={a:.0f}x^{{{b:.2f}}}$")
ax2.set_xscale("log")
ax2.set_yscale("log")
ax2.set_xlabel("Rank (log)")
ax2.set_ylabel("Population (log)")
ax2.set_title("Colombia City Populations: Log-Log")
for i, name in enumerate(city_names):
ax2.annotate(name, (ranks[i], city_pops[i]), textcoords="offset points", xytext=(0, 7), ha='center', fontsize=9)
ax2.legend()
plt.tight_layout()
plt.show()
More info
Much has been written about Zipf’s Law, both for languages and for city populations. If interested, consider looking at:
- Wikipedia article on Zipf’s Law
- An OECD Regional Development Working Paper with lots of interesting graphs and considerations, including aspects about the definition of a city. See references to delve deeper!