@PausalZ@fediscience.org Profile picture
Professional epidemiologist / causal inference researcher / python programmer, amateur mycologist #Python #epitwitter https://t.co/cuewGX6vWD

Jul 7, 2021, 12 tweets

Big fan of the "I forced a bot to [...] over 1000" memes. But most of those posts are fake (i.e. human-generated). That's why I decided to make a real one

So I forced a bot to read over 1000 PubMed abstracts in order to generate new abstracts

Basically, I pulled a random sample of 5000 abstracts from PubMed using the search terms: (causal inference) AND English[Language]

A random sample of the returned abstracts was used to train a recurrent neural network (RNN)

Basically, a sequence of 40 characters is used to predict the next character. This process can then be repeated with the new character to generate a whole new sentence

So you give the machine a starting point, set a 'creativity dial', and let it go

But enough of that nerd stuff. Here are some more abstracts

Personally, I like giving it a starting string where it has to make up the estimates

I can already see you typing, "But is it really AI?"

The answer is yes. I think the following examples demonstrate this quite clearly (by the AI discovering a new alpha-level)

It really does like making the p-value to be around 0.01 though (a very backwards / weird demonstration of publication-bias perhaps?)

Okay, but I helped out my poor RNN a little with the abstracts. I had it generate each of the sections from a starting seed. It doesn't really understand the whole structured abstract concept

I also don't know what measure 'psycionion' is, but I am intrigued to learn

Also it turns out, abstracts of causal inference are pretty predictable. Below is an abstract with the creativity dial turned up to 2.5 (the previous were 0.5)

All the code is available on GitHub. I provide a trained version of the RNN (since it has a long run-time without use of a GPU)

However, my code is structured so you could easily change the search terms and train a new version

github.com/pzivich/RNN-Ab…

I used biopython to query the abstracts from PubMed. The RNN is with tensorflow to do character-level text generation. There are a bunch of online guides if you wanted to code a version from scratch (that's how I did it)

I will say I did run into some fitting issues initially. To prevent over-fitting, I ran 10 epochs with exponential decay learning. I also added bounds to prevent gradient explosions (sounds cooler than what it actually is)

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling