Computational Story Lab Profile picture
Jul 10, 2019 20 tweets 8 min read Read on X
Now, we stretch out words naturally when we speak.

But stretched words (sometimes called elongated words) are fairly rare in book and other text corpora, and they aren’t represented well in dictionaries (if at all).

So we thought, let’s science this.
Stretchfulness in written text arrived in an abundant, accessible source with Twitter (along with the possible end of civilization but that issue is beyond the scope of our current project).

Dataset: 10% of all (140 character) tweets from September 2008 to the end of 2016.
We crafted* a series of regex-based tweet-sifters for capturing words that are naturally stretched in the wilds of Twitter.

We ended up with a skosh over 5000 “kernels” for stretchable words:

*this was not entirely easy
xkcd.com/208/ Image
Kernels match up with words like this:

[g][o][a][l]: Any stretched version of goal with ordering of letters strictly preserved.

(ha): All words with h’s and a’s repeated in any order, as long as they start with an h and contain at least one a.

sq[u][e]: No stretch for s and q.
For each kernel, we plotted their frequency distributions.

Here’s [g][o][a][l]’s distribution.

The base word goal is used much more frequently then its stretched versions. But users tend to give goal a good stretch once they get going. They’re excited. Because football. Image
Now, we’re not saying that the tails of these distributions obey a power-law decay ... but we’re not not saying that either.

For those inclined, please feel free to fight amongst yourselves. You know who you are.
Here’s (ha).

The two-cycle jumps suggest that users try to keep on track with hahahaha but sometimes tragically suffer mistypings (hahahha; perhaps incredibly, we have more on this below). Image
We measure the “stretch” of a word with a standard Gini coefficient.

[g][o][a][l] is somewhat stretchy (G=0.108)
(ha) is stretchier (G=0.245)

A completely non-stretchable word would have G=0 as all instances of the word are the same. No one is special.
Next, we wanted to figure out the balance of a word’s stretch.

For [g][o][a][l], the g (a plosive) is rarely stretched much, while o gets the most stretch, closely followed by o and l.

We base our measure of balance on Shannon’s entropy H. Image
(ha) is balanced almost perfectly (we ignore letter order for these internally jumblable words (jumblable is fun to say)).

We see this kind of consistency of balance across stretch lengths for our entire collection of kernels. Image
Here are the most and least balanced stretchable words.

Not all words are stretched to reflect a vocalized form. Some are stretched for an attempt at visual emphasis, like capital letters also performs (but that’s yelling and rude). ImageImage
And here are the most and least stretchy words (in our set of 5000+ stretchables; many words are not Elaine-level stretchworthy): ImageImage
The stretchiness and balance of stretchable words we found in the Twitter wild make for solid parameters, filling out both dimensions well. Feels like science. Image
Last, we investigated how stretchables like (ha) go wrong with “spelling trees”.

Starting at the top with h, (ha) words trace down through the tree’s branches (h to the left, a to the right). Branch thickness indicates numbers of words.

Lots of self-similar goodness: Image
And we found a whole zoo of these spelling trees: Image
Welcome to the end of this thread.

Our paper is on the arXiv here:

arxiv.org/abs/1907.03920

We also have Online App-endices* with frequency distributions, balance plots, and spelling trees for all kernels:

compstorylab.org/stretchablewor…

*We intend to make these more functional.
Authors: Tyler Gray (PhD UVM, 2019; maple syrup provider), @ChrisDanforth, and @peterdodds.
@ChrisDanforth @peterdodds After careful & critical review by a crack team of experts in electronic emphasis…

our study has been published:

journals.plos.org/plosone/articl…
@ChrisDanforth @peterdodds and here is a lovely piece of coverage in Wired:

wired.com/story/whoooaaa…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Computational Story Lab

Computational Story Lab Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @compstorylab

Aug 18, 2020
New preprint:

“Computational timeline reconstruction of the stories surrounding Trump: Story turbulence, narrative control, and collective chronopathy”

arxiv.org/pdf/2008.07301…

P. S. Dodds, J. R. Minot, M. V. Arnold, T. Alshaabi, J. L. Adams, A. J. Reagan, and C. M. Danforth
Some questions to ask yourself and others:

What happened in the world over the last two weeks?

What about this time last year? Two years ago?

And what order did the major events happen in?
For Trump’s presidency, how easily could individuals recall and sort these example stories?:

- North Korea
- Charlottesville
- kneeling in the National Football League
- Confederate statues
- family separation
- Stormy Daniels
- Space Force
- the possible purchase of Greenland
Read 22 tweets
Jul 28, 2020
We have a new paper, interactive visualization, and data platform.

Nutshell: we’ve curated 100 billion tweets over 10 years to produce day-scale rank/frequency time series for n-grams in over 100 languages.

It’s a whole big thing.

A short thread—
The paper:

“Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter”

arxiv.org/abs/2007.12988
With storywrangler, we’re hoping to enable or enhance the computational study of any large-scale temporal phenomena where people matter including:
culture,
politics,
economics,
linguistics,
public health,
conflict,
climate change,
and
data journalism.
Read 10 tweets
Jun 8, 2020
Thread for a new paper of ours on the arXiv:

“Ratioing the President: An exploration of public engagement with Obama and Trump on Twitter”

arxiv.org/abs/2006.03526

J. R. Minot, M. V. Arnold, T. Alshaabi, C. M. Danforth, P. S. Dodds
We explore the dynamics of how Twitter users have responded to tweets made by Obama and Trump from their main accounts, @BarackObama and @realDonaldTrump.
For each tweet, we track three main characteristics as they evolve over time:

- Number of Favorites
- Number of Retweets
- Number of Replies (hard to measure—see our paper)
Read 17 tweets
Mar 27, 2020
New NCOVID-19 paper thread:

“How the world's collective attention is being paid to a pandemic:
COVID-19 related 1-gram time series for 24 languages on Twitter”

Main site:
compstorylab.org/covid19ngrams/
We make two main contributions:

1. We curate and share usage time series of 1,000 1-grams that have mattered in March of 2020 (words, emojis, hashtags, etc.) for 24 languages.

We hope other researchers can use these time series to connect with other data streams.
2. We show that after a peak in January 2020 in response to the news from Wuhan of a novel contagious disase, the world’s collective attention dropped through much of February before resurging.
Read 23 tweets
Feb 20, 2020
“Noncooperative dynamics in election interference”

New publication from our group in Physical Review E

journals.aps.org/pre/abstract/1… Image
Led by @d_r_dewhurst and inspired by Russian interference in the 2016 election, we simulate the timeless competition between red and blue Image
This is the first study [that we know of] to explore models of election interference in a noncooperative setting [game theory flavor] Image
Read 6 tweets
Jul 10, 2019
New paper threeaaad!!!

Soooooo, we went exploring for stretchable words on Twitter, and we uncovered a strange and amusing realm of language:

“Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings”
Stretchable words are undeniably real:
Yes they are:
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(