Tweet

Computational Story Lab

Oct 17, 2018 • 11 tweets • 5 min read

@nytimes

1/5 Op-ed in @nytimes uses Google n-gram data to claim "most religious and spiritual words have been declining in the English-speaking world since the early 20th century.”

2/5 While this statement could certainly be true, raw n-gram data is not able to support the claim due to underlying non-stationarity. The author is likely referring to trends like figure 5h in the original Culturomics paper, “God” is decreasing.

3/5 However, we’ve shown that the English n-gram data is corrupted by an increase in scientific language from textbooks and academic publications during the 20th century. The trend disappears when looking at English Fiction alone.

4/5 If Google insists on including textbooks & scientific studies, their n-gram viewer should default to display “English Fiction”, the least troublesome version, rather than “English”.

5/5 Otherwise, unsuspecting cultural scholars will continue to be mislead by decreasing 20th-century relative word frequencies.

6/5 Links. Op-Ed:

nytimes.com/2018/10/13/opi…

Evidence of non-stationarity of the corpus:

journals.plos.org/plosone/articl…

“English" counts for “God”:

books.google.com/ngrams/graph?c…

"English Fiction" counts for “God”:

books.google.com/ngrams/graph?c…

@JonathanMerritt

Here's an earlier piece by the same author, @JonathanMerritt, appearing in the Week and claiming the same thing as the NYT piece.

theweek.com/articles/79179…

@JonathanMerritt

@JonathanMerritt Our work is acknowledged but sailed past with an effective "whatever":

Langauge log, as we would expect/hope, has things sorted:
languagelog.ldc.upenn.edu/nll/?p=40222

@JonathanMerritt

@JonathanMerritt Google Books is a fiasco.

How many papers have been written using Google Books as some true representation of culture?

How many offhand observations have been made?

@JonathanMerritt

@JonathanMerritt Our paper is here:

journals.plos.org/plosone/articl…

"Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution"

Just read the introduction. Please.

https://twitter.com/languagelog/status/1052889809285394432?s=21

More from @compstorylab

Computational Story Lab

@compstorylab

Aug 18, 2020

New preprint:

“Computational timeline reconstruction of the stories surrounding Trump: Story turbulence, narrative control, and collective chronopathy”

arxiv.org/pdf/2008.07301…

P. S. Dodds, J. R. Minot, M. V. Arnold, T. Alshaabi, J. L. Adams, A. J. Reagan, and C. M. Danforth

Some questions to ask yourself and others:

What happened in the world over the last two weeks?

What about this time last year? Two years ago?

And what order did the major events happen in?

For Trump’s presidency, how easily could individuals recall and sort these example stories?:

- North Korea
- Charlottesville
- kneeling in the National Football League
- Confederate statues
- family separation
- Stormy Daniels
- Space Force
- the possible purchase of Greenland

Read 22 tweets

Computational Story Lab

@compstorylab

Jul 28, 2020

We have a new paper, interactive visualization, and data platform.

Nutshell: we’ve curated 100 billion tweets over 10 years to produce day-scale rank/frequency time series for n-grams in over 100 languages.

It’s a whole big thing.

A short thread—

The paper:

“Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter”

arxiv.org/abs/2007.12988

With storywrangler, we’re hoping to enable or enhance the computational study of any large-scale temporal phenomena where people matter including:
culture,
politics,
economics,
linguistics,
public health,
conflict,
climate change,
and
data journalism.

Read 10 tweets

Computational Story Lab

@compstorylab

Jun 8, 2020

Thread for a new paper of ours on the arXiv:

“Ratioing the President: An exploration of public engagement with Obama and Trump on Twitter”

arxiv.org/abs/2006.03526

J. R. Minot, M. V. Arnold, T. Alshaabi, C. M. Danforth, P. S. Dodds

@BarackObama

We explore the dynamics of how Twitter users have responded to tweets made by Obama and Trump from their main accounts, @BarackObama and @realDonaldTrump.

For each tweet, we track three main characteristics as they evolve over time:

- Number of Favorites
- Number of Retweets
- Number of Replies (hard to measure—see our paper)

Read 17 tweets

Computational Story Lab

@compstorylab

Mar 27, 2020

New NCOVID-19 paper thread:

“How the world's collective attention is being paid to a pandemic:
COVID-19 related 1-gram time series for 24 languages on Twitter”

Main site:
compstorylab.org/covid19ngrams/

We make two main contributions:

1. We curate and share usage time series of 1,000 1-grams that have mattered in March of 2020 (words, emojis, hashtags, etc.) for 24 languages.

We hope other researchers can use these time series to connect with other data streams.

2. We show that after a peak in January 2020 in response to the news from Wuhan of a novel contagious disase, the world’s collective attention dropped through much of February before resurging.

Read 23 tweets

Computational Story Lab

@compstorylab

Feb 20, 2020

“Noncooperative dynamics in election interference”

New publication from our group in Physical Review E

journals.aps.org/pre/abstract/1…

Led by @d_r_dewhurst and inspired by Russian interference in the 2016 election, we simulate the timeless competition between red and blue

This is the first study [that we know of] to explore models of election interference in a noncooperative setting [game theory flavor]

Read 6 tweets

Computational Story Lab

@compstorylab

Jul 10, 2019

Now, we stretch out words naturally when we speak.

But stretched words (sometimes called elongated words) are fairly rare in book and other text corpora, and they aren’t represented well in dictionaries (if at all).

So we thought, let’s science this.

Stretchfulness in written text arrived in an abundant, accessible source with Twitter (along with the possible end of civilization but that issue is beyond the scope of our current project).

Dataset: 10% of all (140 character) tweets from September 2008 to the end of 2016.

We crafted* a series of regex-based tweet-sifters for capturing words that are naturally stretched in the wilds of Twitter.

We ended up with a skosh over 5000 “kernels” for stretchable words:

*this was not entirely easy
xkcd.com/208/

Read 20 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Computational Story Lab

Try unrolling a thread yourself!

More from @compstorylab

Computational Story Lab

Computational Story Lab

Computational Story Lab

Computational Story Lab

Computational Story Lab

Computational Story Lab

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?