Wikipedia and the Law of Large Numbers
My claim in this post is that the Law of Large Numbers (LLN) “guarantees” that the information in Wikipedia is reliable.
LLN is a theorem in probability that states that a random variable will converge to the population (finited) expected value when the number of observations increases. What am I talking about?
Well, let’s use the same example as Wikipedia: a dice. The probability of the result of rolling a dice being 1 is 1/6. The same for 2, and 3, 4, 5 and 6. So this means that the “expected” value of throwing a dice – since the probabilities are equal for each number – is 3.5: (1+2+3+4+5+6)/6.
So the LLN says that the more I throw the dice, the average result (averaging the new observation with all the previous ones) will converge to 3.5. This is certainly true, and this result constitutes one of the most important theorems for empirical economics.
So, what does this have to do with Wikipedia? Well, Wikipedia is a free online open source encyclopedia. Everybody (with access to the internet) can change the content of one post. So, of course, one would think that anybody in southern Ohio can write something very inaccurate about Abraham Lincoln. Correct. However, when the tool for editing is open to everybody, then another person with the right and accurate information will edit again, and fill the gaps and correct the information if there is the need to.
When the number of users that can do this goes to infinity, then the content of each Wikipedia post converges to its “true value”, which means, the real and accurate information. In equilibrium, Wikipedia is a very accurate source given the large number of people that are editing constantly and correcting any mistakes.
The power of all these shared tools is exactly that: the Law of Large Numbers.







My name is Dany Bahar. I am currently an MPA/ID student at Harvard Kennedy School of Government (class of 2010), and an alumni of the MA in Economics program at the Hebrew University of Jerusalem... 
Hmmm… Not exactly.
First, note that (the simple) LLN says that the average of a sequence of independent identically distributed (iid) random variables will converge to the true mean of the random variable – if that mean exists.
There are versions with weaker assumptions: one of which drops the identically-distributed-ness of the sequence, as long as it is independent. Another one says that you don’t have to have strict independence, and a correlation weak “enough” between the draws is sufficient.
People’s opinions, however, tend to be very much correlated. Bubbles in the stock markets, religions, other institutions, makes them think similar things and when one says something, the other one tends to listen and act accordingly.
Second, even if you had independence, and say that what stays in Wikipedia is some kind of an “average” (it’s not, but lets say that you can prove something like this), LLN says that the average will converge to the mean of this imagined distribution of opinions. It doesn’t say that the mean of this distribution is the objective truth.
But I do understand the resemblance between the concepts. I think there’s a book (probably more than one) about how people’s opinions’ average estimates of some things tend to get closer to the true quantity, but I haven’t read it, so I don’t know what it bases its theory upon.
I think this would be the case if any wikipedia user had the same propensity to edit posts, which might not be the case…so you won’t necessarily approcah the true value if the users who edit are biased one way or another.