Obviously since the above set of numbers was generated at random, the percentage of numbers starting with each digit wasn't precisely equal to one another — the generator was random after all —
but you can see how the odds pointed in that direction (because if you look at the actual numbers starting at 1 and ending at 99 there exist an equal amount of numbers starting with each of the digits), and that when we chart the resulting percentages, the chart is basically "a flat line" where each bar is of the same height (each
digit's percentage share of numbers generated that start with that digit, being roughly the same):
|
|
|
|
|
|
|
|
|
|
%1 |
%2 |
%3 |
%4 |
%5 |
%6 |
%7 |
%8 |
%9 |
The bar chart you see above has nine bars, representing the percentage of numbers in the set that began with the digit 1 (11.4%), through 9 (11.2%). While there was a little variation due to
random chance, each bar is at roughly the same height, meaning that the set had close-to-equal percentages of numbers that began with each of the nine digits.
And if you were to ask the average
person about the prevalence of numbers "out there in the world" beginning with each digit, they might assume it follows a similar
pattern to the above — i.e. equal odds. (And if that average person wanted to commit
fraud by, say, coming up with a bunch of phony invoices, they might choose fake invoice amounts
that are also randomly distributed across the digits in this way). But is the average person
correct in thinking numbers naturally arrange by first-digit equally in this way?
Enter Frank Benford. We'll now need to transport ourselves backwards in history, to a time before computers — even before the time of
pocket calculators, and all the way
back to the days where if you needed a convenient tool to help you perform multiplication, you might go to your public library and borrow its copy of a
physical book of logarithmic
tables.
In these logarithmic table books, the pages were arranged in numerical order such that the 1's were among the beginning pages, the following pages held the 2's, and on through to the 9's at the pages at
the tail end of the book.
Picture Frank Benford going to the library and pulling out a really old — i.e. very
well used — copy of one of these books of logarithmic tables. And he
can't help but notice something unexpected: towards the front of the book, the paper was much more smudged, crinkled,
more clearly-used, than towards the end of the book, where the
paper was comparably pristine.
Benford pondered this observation, and asked himself the question:
Maybe there are more numbers "out there in the world" that start with 1 or 2, than there are with 8
or 9? Could that be?!? Maybe the distribution of numbers in the world
isn't equal (like our chart earlier) but rather forms a
different pattern?
And he set out to compile some statistics, to check. First he set about gathering various naturally-occurring sets of numbers, such as molecular weights of different chemicals;
baseball statistics; census data; bank account balances;
the revenues of all listed companies on the stock market... And next, for each set he
organized the numbers within that set into the nine buckets, depending on whether the first digit of each number was 1 through 9.
And no matter where he looked — sizes of rivers; population counts; number of deaths; areas of counties — he kept encountering the
same distribution pattern, over
and over
and over again, and it always looked something like
this:
|
|
|
|
|
|
|
|
|
|
%1 |
%2 |
%3 |
%4 |
%5 |
%6 |
%7 |
%8 |
%9 |
There were more ones in the 1's bucket than twos in the 2's bucket, more twos than threes, more threes than fours, more fours than fives, more fives than sixes, more sixes than sevens, more sevens than eights, and
more eights than nines!
This result may seem quizzical and counter-intuitive when you hear it for the first time, but for an illustrative example of why this
actually makes sense, let's think about the stock market data set example
that Benford included in his studies:
the revenues that year for each and every public company on the stock market.
Consider the fact, that based on the numbers at the time of this writing, the
year over year revenue growth among S&P 500 companies
as of 9/30/2023 and looking backwards to 2001, averages to approx. 4.3% per year and with a median of approx. 5.4% over that 2+ decade span — let's keep this "typical growth rate" in our heads for context.
Now imagine "typical company X" currently has annual revenue of 800K (putting them in the 8's first-digit bucket), and aspires to graduate up to the 9's bucket by growing their revenue to 900K. The
percentage change in revenue that it will take to accomplish this goal, is +12.5%. And then, from 900K, to graduate from the 9's bucket up to the next digit (circling back to the 1's bucket),
the required revenue growth is even less: an increase of +11.1% brings the company up to 1 million.
Ah, but now that the company has reached 1 million, think about how long they're going to stay in that 1's first-digit bucket... they have to grow through 1.1 million, 1.2 million — they're
not going to reach the 2's first-digit until their revenue has
doubled! A 100% increase in revenue is required to go from 1 to 2 million! Think of how long they'll need to stay in the 1's bucket, compared to the 9's!
Once they finally achieve 100% revenue growth and hit 2 million, the next first-digit bucket (which they'll reach at 3 million of revenue), is a closer hurdle: now, they require 50% growth in revenue to graduate from the 2's up to the 3's.
And once they reach 3 million, the next digit (4) requires 33.3% revenue growth; from 4 to 5 requires 25% revenue growth; 5 to 6 requires 20% revenue growth... See how the percentage change
requirement to jump to the next first-digit bucket keeps shrinking as revenue grows? Then one day they're at 9 million, and require just 11.1% revenue
growth to reach 10 million...
But then once that 10 million mark has been reached, they're back in the 1's first-digit again: and once more they'll need 100% revenue growth (to 20 million) before they're back in the 2's! Think about how comparably
little time they were just in the 9's, whereas now they'll be in the 1's for the entire time that they pass 11 million, 12 million ... 18 million, 19 million, and finally they're back in the 2's with 20
million. But from 20 million, to reach 30 million the hurdle is closer: 50% growth is required (just like it took 50% growth to get from 2 million to 3 million).
Having walked through the digits in this way, you can see that the pattern repeats over and over as the revenue number grows: a full 100% growth (a doubling) is needed to go from the 1's to the 2's, 50% from the 2's to the 3's,
33.3% from the 3's to the 4's, 25% from the 4's to the 5's, 20% from the 5's to the 6's, 16.67% from the 6's to the 7's, 14.29% from the 7's to the 8's, 12.5% from the 8's to the 9's, and 11.1% from the 9's back around to the 1's again.
Actual company revenue growth rates of course can be variable, but if "typical company X" hypothetically were to achieve
perfectly-steady year/year revenue growth throughout their journey from a tiny company to a behemoth, you can see how they'd
spend
much more time with their revenue happening to be in the 1's first-digit bucket than the 2's, that the next-highest amount of time would be spent in the 2's, followed by the 3's, etc.,
until the smallest amount of time would be lived with revenue in the 9's first-digit bucket.
So looking across
all publicly traded companies (from young start-ups to large behemoths and every company in between), when you take all of their annual revenues as a data set, you can envision how this is precisely the pattern you should expect to see in the data — more 1's than 2's, more 2's than 3's, etc. — and
this relationship between the numbers, this downward-sloping curve, is what was described in the
American Philosophical Society Proceedings in 1938, as "The Law of Anomalous Numbers."
But of course, most people today just refer to it as "
Benford's Law."
And it turns out that the discrepancy visible between the set of numbers an average fraudster might create (i.e. someone who might pick their numbers without regard to the natural curve, such as picking at random), versus the normal Benford's Law curve
of a "natural" data set, is highly useful in rooting out suspicious reports.
And that brings us to the reason we created this website. Although Benford had gathered the revenue figures from
every company as a data set, what
we were inspired to do, with this website, was to
drill into each public company, have our code study its individual quarterly 10Q and annual 10K filings and extract the numbers
within each filing — each of which typically includes various revenue line items, various expense line items, balance
sheet line items, share counts, etc. etc. (typically we see around 1000 numbers and sometimes as many as 15000+).
We created software to perform this task and then, for each filing, to arrange the extracted numbers from that filing into a chart, showing the percentage of 1's first-digit numbers, 2's, 3's, etc.
And as our system began to process filing after filing, we quickly found that the numbers
within each filing typically tend to follow the same familiar Benford's Law distribution pattern.
How useful is this in spotting potential fraud? Our first thought was to run some old Enron filings through the software; for instance here's the software's chart of their
filing from 5/15/1998, which seems to have a suspicious count of first-digit-4's in it:
|
|
|
|
|
|
|
|
|
|
%1 |
%2 |
%3 |
%4 |
%5 |
%6 |
%7 |
%8 |
%9 |
As far as
modern-day accounting frauds among current public companies, they might be very difficult to spot, because for one thing, any "professional criminal" is probably already going to know about Benford's Law, and any "fake numbers" would
be carefully chosen in order to conform to it. Another difficulty in spotting fraud using Benford's Law is that the fraudster's selective manipulating one or two very-important numbers could go completely unnoticed amidst a filing with
thousands of numbers in it.
However: we have to imagine that many cases of accounting fraud might begin with an innocent mistake by non-criminals, who then make very poor choices about how to proceed once they realize the mistake.
Consider for example the following quote from
this SEC filing which relates to an accounting scandal that ended in one of the largest class action lawsuit settlements in recent history:
"The Audit Committee based its conclusion on the preliminary findings of its investigation into concerns regarding accounting practices and other matters that first were reported to the Audit Committee on September 7, 2014. The Audit Committee promptly initiated an investigation, which is being conducted with the assistance of independent counsel and forensic experts ... the Audit Committee believes that the Company incorrectly included certain amounts related to its non-controlling interests in the calculation of adjusted funds from operations (“AFFO”), a non-U.S. GAAP financial measure, for the three months ended March 31, 2014 and, as a result, overstated AFFO for this period.
The Audit Committee believes that this error was identified but intentionally not corrected, and other AFFO and financial statement errors were intentionally made, resulting in an overstatement of AFFO and an understatement of the Company’s net loss for the three and six months ended June 30, 2014."
The above quote suggests that the entire scandal began with
an error; that once this error was identified, instead of correcting it, it was
intentionally not corrected; and instead,
apparently to compensate for the error, our read of the quote is that
additional "errors" were intentionally introduced. Anything intentionally introduced is, by definition, "non-natural" and thus might include numbers chosen by a person according to their particular thought process — and perhaps knowledge of Benford's Law was not part of that thought process.
We ran the above company's
2014 10K filing through our software, and here's the resulting chart:
|
|
|
|
|
|
|
|
|
|
%1 |
%2 |
%3 |
%4 |
%5 |
%6 |
%7 |
%8 |
%9 |
That 7's column in particular raises an eyebrow — seeing that, one might have the thought,
"that seems quite a bit too high and I should investigate further", wouldn't you agree? Now it turns out
that the 10K filing in question includes this line:
"The number of outstanding shares of the registrant’s common stock on May 7, 2014 was 769,995,602 shares." Their
share count (a number that would tend to get repeated quite often throughout a filing) began with the digit 7, which could definitely help explain an above-expected result in the 7's column.
But the main point is this: for our own investment research we have found that having a handy way to see how the numbers within a given company's 10Q and 10K filings compares with the expected Benford's Law distribution, is a profoundly useful research tool.
And we wanted to share our tool with the investing public. It can be of great use in flagging things that you would like to investigate more closely during your research. For example while the chart above
might inevitably cause one to want to research why the 7's column came in so high, the
below chart, from the very same company in its final 10K filing before it was acquired by a
larger competitor, looked like
this:
|
|
|
|
|
|
|
|
|
|
%1 |
%2 |
%3 |
%4 |
%5 |
%6 |
%7 |
%8 |
%9 |
Seeing the above, one might think, "okay,
that looks about normal."
Once someone has learned about Benford's Law, it is natural to be curious to know how the filings of
stocks you own happen to look in comparison
to the normal Benford curve. Perhaps everything looks totally normal — great. But suppose the chart for one of
your stocks looked the
below...
Wouldn't you want to know?
|
|
|
|
|
|
|
|
|
|
%1 |
%2 |
%3 |
%4 |
%5 |
%6 |
%7 |
%8 |
%9 |
So now having read this far, you will understand exactly what our website is all about, and how we are applying Benford's Law to
stocks: the software we commissioned gathers and analyses 10Q and 10K filings, outputs charts like the ones you've seen above, and then also applies
an in-house "scoring algorithm" (
learn more about our scoring algorithm) in order to give an indication of how normal or abnormal the distribution pattern looks, versus the expectation based on Benford's Law.
To explore our website further, you can
jump to the top and enter any company name or ticker symbol into the search box in the main menu, or, browse through the
latest filings our system has processed. Click through on any company for the full details. Thanks for visiting,
and we hope our site becomes one of your go-to research tools for stock market research.