Computational Story Lab Profile picture
Awesome Research Group run by @peterdodds and @chrisdanforth: The Computational Story Lab.

Jul 10, 2019, 20 tweets

Now, we stretch out words naturally when we speak.

But stretched words (sometimes called elongated words) are fairly rare in book and other text corpora, and they aren’t represented well in dictionaries (if at all).

So we thought, let’s science this.

Stretchfulness in written text arrived in an abundant, accessible source with Twitter (along with the possible end of civilization but that issue is beyond the scope of our current project).

Dataset: 10% of all (140 character) tweets from September 2008 to the end of 2016.

We crafted* a series of regex-based tweet-sifters for capturing words that are naturally stretched in the wilds of Twitter.

We ended up with a skosh over 5000 “kernels” for stretchable words:

*this was not entirely easy
xkcd.com/208/

Kernels match up with words like this:

[g][o][a][l]: Any stretched version of goal with ordering of letters strictly preserved.

(ha): All words with h’s and a’s repeated in any order, as long as they start with an h and contain at least one a.

sq[u][e]: No stretch for s and q.

For each kernel, we plotted their frequency distributions.

Here’s [g][o][a][l]’s distribution.

The base word goal is used much more frequently then its stretched versions. But users tend to give goal a good stretch once they get going. They’re excited. Because football.

Now, we’re not saying that the tails of these distributions obey a power-law decay ... but we’re not not saying that either.

For those inclined, please feel free to fight amongst yourselves. You know who you are.

Here’s (ha).

The two-cycle jumps suggest that users try to keep on track with hahahaha but sometimes tragically suffer mistypings (hahahha; perhaps incredibly, we have more on this below).

We measure the “stretch” of a word with a standard Gini coefficient.

[g][o][a][l] is somewhat stretchy (G=0.108)
(ha) is stretchier (G=0.245)

A completely non-stretchable word would have G=0 as all instances of the word are the same. No one is special.

Next, we wanted to figure out the balance of a word’s stretch.

For [g][o][a][l], the g (a plosive) is rarely stretched much, while o gets the most stretch, closely followed by o and l.

We base our measure of balance on Shannon’s entropy H.

(ha) is balanced almost perfectly (we ignore letter order for these internally jumblable words (jumblable is fun to say)).

We see this kind of consistency of balance across stretch lengths for our entire collection of kernels.

Here are the most and least balanced stretchable words.

Not all words are stretched to reflect a vocalized form. Some are stretched for an attempt at visual emphasis, like capital letters also performs (but that’s yelling and rude).

And here are the most and least stretchy words (in our set of 5000+ stretchables; many words are not Elaine-level stretchworthy):

The stretchiness and balance of stretchable words we found in the Twitter wild make for solid parameters, filling out both dimensions well. Feels like science.

Last, we investigated how stretchables like (ha) go wrong with “spelling trees”.

Starting at the top with h, (ha) words trace down through the tree’s branches (h to the left, a to the right). Branch thickness indicates numbers of words.

Lots of self-similar goodness:

And we found a whole zoo of these spelling trees:

Welcome to the end of this thread.

Our paper is on the arXiv here:

arxiv.org/abs/1907.03920

We also have Online App-endices* with frequency distributions, balance plots, and spelling trees for all kernels:

compstorylab.org/stretchablewor…

*We intend to make these more functional.

Authors: Tyler Gray (PhD UVM, 2019; maple syrup provider), @ChrisDanforth, and @peterdodds.

The start of the thread is here:

@ChrisDanforth @peterdodds After careful & critical review by a crack team of experts in electronic emphasis…

our study has been published:

journals.plos.org/plosone/articl…

@ChrisDanforth @peterdodds and here is a lovely piece of coverage in Wired:

wired.com/story/whoooaaa…

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling