@QuasLacrimas@VirtuArete I can help you if you'd like. I have multiple processes for performing Benford's chi-squared tests using Python. If your data is available via link, let me know and I can customize a script for you and send it to you.
@QuasLacrimas@VirtuArete To speed things along, look at the following I'm posting. First use this to create the Benford's ratios:
def getBenfords():
expected = [log10(1+1/d) for d in range(1,10)]
return expected
@QuasLacrimas@VirtuArete Next, use this to perform the test
from scipy.stats import chisquare
benfords = get Benfords()
expectedvals = [sum(actualvals)*a for a in benfords]
actualpercent = [a/sum(actualvals) for a in actualvals]
chival,pval = chiTest(actualvals,expectedvals)
@QuasLacrimas@VirtuArete Next, use this to perform the test
from scipy.stats import chisquare
benfords = get Benfords()
expectedvals = [sum(actualvals)*a for a in benfords]
actualpercent = [a/sum(actualvals) for a in actualvals]
chival,pval = chiTest(actualvals,expectedvals)
@QuasLacrimas@VirtuArete In that second one, the "actualvals" is a list (or array) of frequency counts for digits 1-9, like this [30,18,13,10,8,7,6,5,4]
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I am making these tweets to explain in one place some analysis that was done last night.
1 - I was asked offline about doing Benford's on election data. I explained that this is common and a useful way to detect anomalies in data that are driven by artificial process (e.g. fraud)
2 - My student then pointed me towards a tweet that was exploring this type of analysis (but they hadn't done Benford's). So I chimed in.
3 - However, I did not know what data they used so I found a source for the context they referenced. However, I could not initially find write-ins versus non-write-ins, so I looked at candidate counts.