Also called the “law of first digits,” Benford’s Law says that the first digits of numbers are not evenly distributed for naturally occurring numerical data. “The law maintains that the numeral 1 will be the leading digit in a genuine data set of numbers 30.1% of the time; the numeral 2 will be the leading digit 17.6% of the time; and each subsequent numeral, 3 through 9, will be the leading digit with decreasing frequency.” Source.
Here’s a chart illustrating the relative occurrence of first digits:
Benford’s Law applies to datasets with a large amount of numbers — hundreds or more — and which are not entirely random but socially or naturally generated. “Practically any group of data obtained carrying out ‘measurements’ satisfied the law, provided the numbers were not arbitrarily assigned and without restrictions (telephone numbers, identity cards or passport numbers, dates, etc), and neither random uniform nor normal distributions (lottery, weight and/or height of adult people, etc).” Source.
Examples of where Benford’s Law applies include:
- electricity bills
- street addresses
- stock prices
- house prices
- population numbers
- death rates
- lengths of rivers
- molecular and atomic weight
- the half-lives of radioactive atoms,
- cost data
- powers and square root of whole numbers
- budget and financial data of corporations including income statements and balance sheets
- inventory listings
- timesheet data
- expense reports
- income tax returns
Here’s a chart comparing a few different datasets with what Benford’s Law would predict:
Why does Benford’s Law occur? That some datasets follow Benford’s Law is explained by the fact that lower numbers must occur before higher numbers. Also, sets of numbers that don’t follow Benford’s Law — such as randomly generated numbers — follow Benford’s Law when multiplied together. I found the below chart sort of mind-blowing (and here’s an IFOD that hits on the Fibonacci Series):
However, there isn’t a universal explanation of why Benford’s Law occurs. That some datasets can be explained but not others is intriguing and mathematicians continue to research why Benford’s Law occurs and seek a universal explanation.
Finally, why does this matter? Benford’s Law is used by data scientists and forensic accountants to test data quality as well as detecting fraud. It is useful in debugging computer programs, uncovering voting anomolies, and finding tax cheats.
Fascinating, indeed. Also good to remember when appearing as a contestant on The Price Is Right, and you’re asked to guess prices of items?