So what's up with the Russian election two weeks ago? Was there fraud?
Of course there was fraud. Widespread ballot stuffing was videotaped etc., but we can also prove fraud using statistics.
See these *integer peaks* in the histograms of the polling station results? 🕵️♂️ [1/n]
These peaks are formed by polling stations that report integer turnout percentage or United Russia percentage. E.g. 1492 ballots cast at a station with 1755 registered voters. 1492/1755 = 85.0%. Important: 1492 is not a suspicious number! It's 85.0% which is suspicious. [2/n]
We can use binomial Monte Carlo simulation to find how many polling stations with integer percentages there should be by chance. Then we can compute the number of EXCESS integer polling stations (roughly the summed heights of all INTEGER PEAKS).
Resulting excess is 1300. [3/n]
1300 clearly fraudulent stations is a lot! But it's not as many as in the last years, especially in 2020 (constitutional referendum). [4/n]
Does it mean that there was less fraud this time? Not at all! But it seems it was less stupidly done.
Here is a 2D scatter plot of turnout vs. United Russia result. This suggests the actual result was ~30%, possibly a few % more, instead of the official 49.8%. [5/n]
Here is how this "comet" compares to the previous federal elections over the Putin era.
In terms of how many % points were added to the leader's result during counting, this election may actually have been the worst ever (but it's a close call with 2011). [6/n]
See our series of papers (with Sergey Shpilkin and @MPchenitchnikov) regarding the methodology of integer peak calculations:
Just an example of how stupidly it _was_ sometimes done. This entire 2D integer peak with 75.0% turnout and 75.0% United Russia result (back in 2011) was due to one single city: Sterlitamak (in Bashkortostan). Obviously they did not even count the ballots. [8/n]
You can find all the data (in CSV) and my analysis code (as a Python notebook) at github.com/dkobak/electio…. The data have been scraped by Sergey Shpilkin. [9/n]
Scraping the data was much more difficult this time, because it was deliberately obfuscated (see below). Of course eventually people wrote several de-obfuscators, e.g. see this very detailed write-up by Alexander Shpilkin: purl.org/cikrf/un/unfuc…. [10/10]
Update: here is my new favourite plot on this topic. I pooled the data from all 11 federal elections from 2000 to 2021 and made a scatter plot of all 1+ million polling stations together. Just look at the periodic integer pattern in the top-right (i.e. fraudulent) corner! [11/10]
• • •
Missing some Tweet in this thread? You can try to
force a refresh
How many academic papers are written with the help of ChatGPT? To answer this question, we analyzed 14mln PubMed abstracts from 2010 to 2024 and looked for excess words:
** Delving into ChatGPT usage in academic writing through excess vocabulary **
Really excited to present new work by @ritagonmar: we visualized the entire PubMed library, 21 million biomedical and life science papers, and learned a lot about --
We took all (21M) English abstracts from PubMed, used a BERT model (PubMedBERT) to transform them into 768D vectors, and then used t-SNE to visualize them in 2D.
We used the 2D map to explore the library, and confirmed each insight in 768D.
We focus on four insights. 2/n
Case study #1: Covid-19 literature.
When looking at the t-SNE map colored by publication year (yellow = newer papers), we immediately see a bright yellow cluster. A large cluster of related papers, all published in 2020-21. What could it be? 🤔
We held a reading group on Transformers (watched videos / read blog posts / studied papers by @giffmana@karpathy@ch402@amaarora@JayAlammar@srush_nlp et al.), and now I _finally_ roughly understand what attention does.
Here is my take on it. A summary thread. 1/n
Consider BERT/GPT setting.
We have a text string, split into tokens (<=512). Each token gets a 768-dim vector. So we have a 2D matrix X of arbitrary width. We want to set up a feed-forward layer that would somehow transform X, keeping its shape.
How can this be set up? 2/n
Fully-connected layer does not work: it cannot take input of variable length (and would have too many params anyway).
Only acting on the embedding dimension would process each token separately, which is clearly not sufficient.
I think we have finally understood the *real* difference between t-SNE and UMAP. It involves NCE! [1/n]
In prior work, we (@jnboehm@CellTypist) showed that UMAP works like t-SNE with extra attraction. We argued that it is because UMAP relies on negative sampling, whereas t-SNE does not.
Because UMAP uses negative sampling, its effective loss function is very different from its stated loss function (cross-entropy). @jnboehm showed it via Barnes-Hut UMAP, while Sebastian and Fred did mathematical analysis in their NeurIPS 2021 paper proceedings.neurips.cc/paper/2021/has… [3/n]
My paper on Poisson underdispersion in reported Covid-19 cases & deaths is out in @signmagazine. The claim is that underdispersion is a HUGE RED FLAG and suggests misreporting.
What is "underdispersion"? Here is an example. Russia reported the following number of Covid deaths during the first week of September 2021: 792, 795, 790, 798, 799, 796, 793.
Mean: 795. Variance: 11. For Poisson random data, mean=variance. So this is *underdispersed*. /2
For comparison, during the same week US reported 1461, 1185, 1202, 1795, 2010, 2003, 1942 deaths. Mean: 1657. Variance: 135470. So this is *overdispersed*.
Overdispersion is not surprising: day-of-week reporting fluctuations, epidemic growth, etc.
Chari et al. (@lpachter) have updated their preprint and doubled down on their claim that an 🐘-looking embedding, a random (!) embedding, and 2D PCA, all preserve data structure "similar or better" than t-SNE.
They literally say: "Picasso can quantitatively represent [local and global properties] similarly to, or better, than the respective t-SNE/UMAP embeddings".
In my thread below I argued it's a non-sequitur from Fig 2, due to insufficient metrics. [2/n]
I argued that they should also consider metrics like kNN recall or kNN classification accuracy, where t-SNE would fare much better than these other methods.
I thought it should be obvious from this figure (using MNIST). But now @lpachter says it's a "mirage".