12,399 views

Paul Poast

@ProfPaulPoast

, 36 tweets, 11 min read

My Authors

Showed my grad students how to use "Machine Learning" to study "Militarized Interstate Dispute" data.

Why did I do it and what did the students (and the computer) learn?

[THREAD]

A bit of context. The theme of the class was the use of "event data" to study conflict.

A prominent example of such data are "Militarized Interstate Disputes" or MIDs

I had the students read Gochman and Maoz's 1984 JCR piece that introduced the concept

journals.sagepub.com/doi/10.1177/00…

@II_journal

@II_journal

Which, in turn, built on data first presented in @II_journal in 1982

tandfonline.com/doi/abs/10.108…

What is a MID? It's several things, as captured in the definition on the Correlates of War's own website:

correlatesofwar.org/data-sets/MIDs

The list of actions that can fall under each category is described in detail in the Gochman and Maoz paper

Unlike interstate wars -- where there are less than 100, even in the latest version of the Correlates of War dataset -- there were hundreds of MIDS (at the time Gochman and Maoz wrote)

...and today there are thousands of MIDs.

Well, 2,315 MIDs in version 4.3 of the data

correlatesofwar.org/data-sets/MIDs…

@dmgibler

@dmgibler

As one might expect, the coding of such a large number of events is not perfect.

That is why @dmgibler and his team at @UofAlabama have been hard at work correcting and cleaning these data

dmgibler.people.ua.edu/mid-data.html

How did (and do) scholars obtain the information to identify MIDs? Lot's of sources (from the Gochman and Maoz paper)

But what about for the most recent events, such as those from the 1990s on?

Lots of sources, but a key one is media reports

@gdeltproject

@gdeltproject

That could translate into A LOT of information to process each DAY, let alone each year!

For instance, consider just the amount of "events" recorded each day by the @gdeltproject

gdeltproject.org/data.html

They collected nearly 12 MB of news reports for just Tuesday alone!

@UMich

@UMich

How to process such info? Well, you could take the approach of Gochman and Maoz: hire A LOT more @UMich grad student RAs! #GoBlue!

And, indeed, there are A LOT of times where that is the right choice.

But could computers help? I mean, they're dumb, but they work hard!

That's what led us to talk about Machine Learning

@ChrisBishopMSFT

@ChrisBishopMSFT

The Machine Learning textbook by @ChrisBishopMSFT is a great way to REALLY dive into the topic.

@kozyrkov

@kozyrkov

But we didn't need to go that far.

Instead, I love how @kozyrkov describes machine learning in this @hackernoon post

hackernoon.com/the-simplest-e…

Machine learning is just a thing labeler...

...not a "magical box of magic"

So I had the students assume that they were given the following sheet of dyadic event data from the year 2000

All they are told about the year 2000 events are the two countries involved, the month of the event, and the day of the event.

The task? Find a way to label the events that were high hostility level MIDs (MID=4 or MID=5).

Maybe they could do this by hand. But life would be easier if they could somehow tell a computer to "figure it out" for them!

Our Machine learning procedure needed three things:

1) Training data (i.e. lots and lots of examples of events where we know the MID level).

For that, we used all MIDs from 1816 to 1998.

The data came from here:

correlatesofwar.org/data-sets/MIDs…

2) Testing data (i.e. a year of events where we know the MID levels but want to see if our "lessons" from the training data are useful)

For that, we used the MID data from 1999

3) An algorithm.

Even though the machine is "learning" it still needs to be told what it's looking at and how to "think" about what it's looking at -- again, computers can learn, but they are dumb!

In this case, we told it to look at the participants in the "Training data" and see if anyone is involved in a lot of high level MIDs

https://twitter.com/ProfPaulPoast/status/1204745930144174080

https://twitter.com/ProfPaulPoast/status/1204745930144174080

Guess what the computer found?

Yep, Russia is involved in A LOT of high level MIDs!

https://twitter.com/ProfPaulPoast/status/1204745930144174080

Equipped with that information, we turned to the "Testing Data".

To keep it simple, we told the computer to simply assume that every event with Russia was a high level conflict.

We then evaluated how this assumption worked.

Well, we got some cases right!

But we also got a lot wrong!

So the computer needed to learn from this mistake. What should it learn?

We discussed the possible lessons:

1) Maybe not assume ALL events involving Russia ended in a high level MID (maybe just some fraction)

2) Assign different probabilities to the events that include Russia to determine which are labeled as "high level MIDs"

3) Tweak the algorithm to look for additional "conflict prone" countries (not just the top one).

The students could then see how you would want the computer to try all of these options (and different combinations of these options).

Doing so would require

training -> checking -> training -> checking ....over and over again.

Again, the computer will work hard.

This process would stop when the computer (according to some rule we gave it) stopped trying different "tweaks".

So machine learning isn't really "learning" in the sense of deeply contemplating the issue before making a decision.

It's more "brute force"

But it (mostly, sometimes) works!

Overall, the students (hopefully) learned a bit about “event data” (and MIDs specifically) and were introduced to some basics of programming.

...and are no longer intimidated by the phrase “Machine Learning”

[END]

Enjoying this thread?

Keep Current with Paul Poast

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

Related hashtags

More from @ProfPaulPoast see all

Related threads

Trending hashtags

Did Thread Reader help you today?