how the tax office uses math to catch ppl faking invoices...? (°ロ°)
its called Benford's Law (or First-Digit Law).
human intuition is trash. if i ask u to make up a list of random prices, u'll probably try to distribute the starting numbers evenly (a bit of 1, a bit of 5, a bit of 9...).
but irl (in natural and financial data), that dont happen.
benford's law says the digit 1 shows up as the first number about 30.1% of the time. the digit 9? only 4.6%.
the curve looks kinda like this~
| Digit | Expected Frequency |
|---|---|
| 1 | 30.1% |
| 2 | 17.6% |
| 3 | 12.5% |
| 4 | 9.7% |
| 5 | 7.9% |
| 6 | 6.7% |
| 7 | 5.8% |
| 8 | 5.1% |
| 9 | 4.6% |
how the tax office catches u
when a company or some politician tries to "make up" accounting data, they invent numbers. and since humans are terrible at generating randomness, invented numbers usually dont follow this curve.
the auditor runs a script, plots the graph and... BAM! if the bar for number 9 is huge and the one for 1 is tiny, its a red flag right away. its a mathematical way of saying "ur lying". (⌐■_■)
validating with java
since i dont like python, i made a quick snippet in Java for anyone who wants to test this theory on some csv dataset they got lying around.
import java.util.HashMap;
import java.util.Map;
public class BenfordLawChecker {
public static void main(String[] args) {
// imagine this is a huge list of prices or population numbers
long[] data = {120, 1500, 300, 45, 9000, 11, 199, 250, 800};
Map<Integer, Integer> counts = new HashMap<>();
for (long value : data) {
int firstDigit = Integer.parseInt(String.valueOf(value).substring(0, 1));
counts.put(firstDigit, counts.getOrDefault(firstDigit, 0) + 1);
}
System.out.println("Distribution found:");
counts.forEach((k, v) -> {
double percentage = (v * 100.0) / data.length;
System.out.printf("Digit %d: %.2f%%\n", k, percentage);
});
}
}