, 36 tweets, 11 min read
My Authors
Read all threads
Showed my grad students how to use "Machine Learning" to study "Militarized Interstate Dispute" data.

Why did I do it and what did the students (and the computer) learn?

[THREAD]
A bit of context. The theme of the class was the use of "event data" to study conflict.

A prominent example of such data are "Militarized Interstate Disputes" or MIDs
I had the students read Gochman and Maoz's 1984 JCR piece that introduced the concept

journals.sagepub.com/doi/10.1177/00…
Which, in turn, built on data first presented in @II_journal in 1982

tandfonline.com/doi/abs/10.108…
What is a MID? It's several things, as captured in the definition on the Correlates of War's own website:

correlatesofwar.org/data-sets/MIDs
The list of actions that can fall under each category is described in detail in the Gochman and Maoz paper
Unlike interstate wars -- where there are less than 100, even in the latest version of the Correlates of War dataset -- there were hundreds of MIDS (at the time Gochman and Maoz wrote)
...and today there are thousands of MIDs.

Well, 2,315 MIDs in version 4.3 of the data

correlatesofwar.org/data-sets/MIDs…
As one might expect, the coding of such a large number of events is not perfect.

That is why @dmgibler and his team at @UofAlabama have been hard at work correcting and cleaning these data

dmgibler.people.ua.edu/mid-data.html
How did (and do) scholars obtain the information to identify MIDs? Lot's of sources (from the Gochman and Maoz paper)
But what about for the most recent events, such as those from the 1990s on?

Lots of sources, but a key one is media reports
That could translate into A LOT of information to process each DAY, let alone each year!

For instance, consider just the amount of "events" recorded each day by the @gdeltproject

gdeltproject.org/data.html
They collected nearly 12 MB of news reports for just Tuesday alone!
How to process such info? Well, you could take the approach of Gochman and Maoz: hire A LOT more @UMich grad student RAs! #GoBlue!
And, indeed, there are A LOT of times where that is the right choice.

But could computers help? I mean, they're dumb, but they work hard!

That's what led us to talk about Machine Learning
The Machine Learning textbook by @ChrisBishopMSFT is a great way to REALLY dive into the topic.
But we didn't need to go that far.

Instead, I love how @kozyrkov describes machine learning in this @hackernoon post

hackernoon.com/the-simplest-e…
Machine learning is just a thing labeler...
...not a "magical box of magic"
So I had the students assume that they were given the following sheet of dyadic event data from the year 2000

All they are told about the year 2000 events are the two countries involved, the month of the event, and the day of the event.
The task? Find a way to label the events that were high hostility level MIDs (MID=4 or MID=5).

Maybe they could do this by hand. But life would be easier if they could somehow tell a computer to "figure it out" for them!
Our Machine learning procedure needed three things:

1) Training data (i.e. lots and lots of examples of events where we know the MID level).

For that, we used all MIDs from 1816 to 1998.

The data came from here:

correlatesofwar.org/data-sets/MIDs…
2) Testing data (i.e. a year of events where we know the MID levels but want to see if our "lessons" from the training data are useful)

For that, we used the MID data from 1999
3) An algorithm.

Even though the machine is "learning" it still needs to be told what it's looking at and how to "think" about what it's looking at -- again, computers can learn, but they are dumb!
In this case, we told it to look at the participants in the "Training data" and see if anyone is involved in a lot of high level MIDs
Guess what the computer found?

Yep, Russia is involved in A LOT of high level MIDs!

Equipped with that information, we turned to the "Testing Data".

To keep it simple, we told the computer to simply assume that every event with Russia was a high level conflict.
We then evaluated how this assumption worked.

Well, we got some cases right!
But we also got a lot wrong!
So the computer needed to learn from this mistake. What should it learn?

We discussed the possible lessons:

1) Maybe not assume ALL events involving Russia ended in a high level MID (maybe just some fraction)
2) Assign different probabilities to the events that include Russia to determine which are labeled as "high level MIDs"
3) Tweak the algorithm to look for additional "conflict prone" countries (not just the top one).
The students could then see how you would want the computer to try all of these options (and different combinations of these options).

Doing so would require

training -> checking -> training -> checking ....over and over again.

Again, the computer will work hard.
This process would stop when the computer (according to some rule we gave it) stopped trying different "tweaks".

So machine learning isn't really "learning" in the sense of deeply contemplating the issue before making a decision.

It's more "brute force"
But it (mostly, sometimes) works!
Overall, the students (hopefully) learned a bit about “event data” (and MIDs specifically) and were introduced to some basics of programming.

...and are no longer intimidated by the phrase “Machine Learning”

[END]
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Paul Poast

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!