]>
Benford's law refers to probability distributions that seem to govern the significant digits in real data sets. The law is named for the American physicist and engineer Frank Benford, although the law
was actually discovered earlier by the astronomer and mathematician Simon Newcomb.
To understand Benford's law, we need some preliminaries. Recall that a positive real number can be written uniquely in the form (sometimes called scientific notation) where is the mantissa and is the exponent (both of these terms are base 10, of course). Note that
where the logarithm function is the base 10 common logarithm instead of the usual base natural logarithm. In the old days BC (before calculators), one would compute the logarithm of a number by looking up the logarithm of the mantissa in a table of logarithms, and then adding the exponent. Of course, these remarks apply to any base , not just base 10. Just replace 10 with and the common logarithm with the base logarithm.
Suppose now that is a number selected at random from a certain data set of positive numbers. Based on empirical evidence from a number of different types of data, Newcomb, and later Benford, noticed that the mantissa of seemed to have distribution function for . We will generalize this to an arbitrary base . Thus, let
Show that satisfies the mathematical properties of a distribution function for a continuous distribution on .
Note that the corresponding probability density function is for ,
Show that
For the standard base 10 decimal case
Assume now that the base is a positive integer , which of course is the case in standard number systems. Suppose that the sequence of digits of our mantissa (in base ) is , so that
Thus, our leading digit takes values in , while each of the other significant digits takes values in . Our goal is to compute the joint probability density function of the first digits. But let's start, appropriately enough, with the first digit law, the discrete probability density function of the leading digit:
Show that for . Hint: Note that if and only if .
Consider the standard base 10 decimal case.
Now, to compute the joint probability density function of the first significant digits, some additional notation will help. If and for , let
Of course, this is just the base version of what we do in our standard base 10 system: we represent integers as strings of digits between 0 and 9 (except that the first digit cannot be 0). Here is a base 5 example:
Show that
Hint: Note that . Now use the distribution function of and properties of logarithms.
In the standard base 10 decimal case, explicitly compute the values of the joint probability density function of .
Of course, the probability density function of a given digit can be obtained by summing the joint probability density over the unwanted digits in the usual way. However, except for the first digit, these functions do not reduce to simple expressions.
Show that
Consider the standard base 10 decimal case.
Comparing Exercise 6 and Exercise 10, note that the distribution of is flatter than the distribution of . In general, it turns out that distribution of converges to the uniform distribution on as . Interestingly, the digits are dependent.
Use the results of Exercise 6, Exercise 8, and Exercise 10 to show that and are dependent in the standard base 10 decimal case.
Aside from the empirical evidence noted by Newcomb and Benford (and many others since), why does Benford's law work? For a theoretical explanation, see the article A Statistical Derivation of Significant Digit Law, by Ted Hill.