Okay, naturally you picked the toughest one! 🙂

Let’s start with a hypothetical:
Your friend has never watched the movie Jurassic Park.

You get to ask them up to 10 questions about their movie preferences before recommending whether they should watch it. What are the Qs?
In other words, you’re trying to predict whether they’ll like the movie.

The Qs might go:
1. Do you like sci-fi movies?
2. If yes, are you okay w movies w some violence?
3. If yes, do you like Jeff Goldblum?
4. If you don’t like Goldblum, do you like Laura Dern?



And so on.
After a while, you think you have a questionnaire that at the end, will be able to decide if you should recommend them to try Jurassic Park.



One class of machine learning algorithm, called decision trees, makes “questionnaires” kind of like this.
Except if a data scientist at a big streaming company decides what the Qs should be, they have access to millions of users’ worth of watching histories, including whether they watched & finished (or started & didn’t finish) Jurassic Park or similar movies (like the sequels).
They write (or use pre-written) code to comb through the data to see which questions are most effective at correctly guessing whether the resulting recommendation matches their actual viewing preferences.
The algorithm can make a MASSIVE number of educated guesses for the right Qs to ask by processing the data in matrix form, then keeps iterating to minimize a mathematical formulation of the error (or the penalty for a wrong result).



This is an example of machine learning.
The result could be really good set of questions that a human would need pretty good cinema knowledge to match.



And the power is SCALE: the machine can then do this for every single user on the platform, and for any single movie.
And it can do this without any real knowledge about movies!

The algorithm has never watched a single movie; it's just good at "learning" that asking about sci-fi movies and Jeff Goldblum (maybe, I don't actually know) helps when guessing if someone will like Jurassic Park.
This is *one type* of machine learning problem called classification (predicting a categorical outcome, in this case whether a user will “like”, or click on and watch, Jurassic Park).

And the example here involves a tree-based algorithm.
This example also has structured data (viewing logs of users, and labels about the content they viewed), and the algorithm must self-tune to find the right sets of questions to achieve the best results.
There are many approaches to making good movie recommendations (e.g. you could use the similarity of movies or similarity of user behavior), and this is a simplified example.
There are many other kinds of ML problems and algorithms.



For example,
- finding patterns like clusters of “also watched” movies
- finding topics & themes & sentiment in the script
- identifying on-screen text from handwritten text and signage
- facial recognition of actors
And it's probably occurred to you already that you can use these outside of the movie/streaming industry.

There are algorithms all over the internet (and irl) who are trying to decide what you might be interested in—all based on your behavioral and other data.
Algorithms will try (are trying) to predict

- whether a credit card transaction is fraudulent
- whether you’ll click on an ad and buy
- whether you'll default on a loan
- what your political leanings are
- whether you have (or want) young children

Are you uncomfortable yet?
Most ML algorithms have a few things in common:
- They are trained and executable on massive amounts of data. (More on the training aspect later!)
- The “machine” doesn’t understand the actual subject matter.




- The "machine" relies on a mathematical abstraction of the objective, and of how well it’s achieving the objective.
- It doesn't need to explain its choices.
- It doesn’t necessarily know about inaccuracies or biases in the data.
I hope that was helpful in conceptualizing what ML is, and in thinking about the places where it could be used.

I'll talk later about some practical considerations that ML practitioners think about, like the importance of training data and model testing.
I'll also devote a thread (or a few) to the important topics of bias in ML/AI models, the MANY open ethical questions around the application of this technology—including how to use it responsibly (and if it can be used responsibly), and who bears that responsibility.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with RealScientists | Taka

RealScientists | Taka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @realscientists

16 Mar
Merry Tuesday!

Tuesdays are usually my most eventful days of the week, by design.

I have 9 meetings today, the earliest at 11 AM and the latest ending at 10:30 PM.

And you will ask: "Why, Taka, WHY in the name of reason and science would you do this to yourself?"
1. We have a regular call of all the department leads. That includes people in Korea, US ET + PT. Right now, we have it in the evening my time.

2. To minimize the number of evenings I'm in meetings, I put the rest of my meetings with Korea teams back-to-back with the above.
3. I also put cross-team meetings with stakeholders on Tuesdays, during the US day. These only happen every other week… but why also put them on Tuesdays!?

There are THREE reasons why I do this to myself (and my team, what a monstrous boss!).
Read 12 tweets
15 Mar
Let me tackle this: What am I doing to ensure equity and justice for women in my field (tech / data science)?
Recent data: ~80% of technical roles in the biggest tech companies are held by men.
wired.com/story/five-yea…

Further: ~92% of Fortune 500 CEOs are men, and I have yet to meet a female CTO.
Let me preface my answer to the question with Angela Davis’ famous quote: “In a racist society, it is not enough to be non-racist, we must be anti-racist.”

The point applies not only to racism, but to systemic inequity of all forms.
Read 21 tweets
15 Mar
So what is my job, anyway?

I work at a serialized fiction startup called Radish.

Our core offering is romance fiction, but we have content from all genres.

We had press coverage last summer over our last big funding round.

techcrunch.com/2020/08/04/rad…
Users can read stories for free. For most stories, they'll hit a wall after several chapters. They can wait for the next chapter to unlock in an hour, or you can use in-app currency to read it right away.

Authors can publish their content, and we share the revenue it generates.
We have about 50 employees, about half of whom are in the US and half of whom are in Korea.

The DS team has 5 members, so we're a pretty big fraction of the company.
Read 5 tweets
15 Mar
Happy Monday!

I’ll share a little bit here about my work day, throughout the day as I find time.

Here’s my “office,” a cramped corner of a bedroom where I’ve been doing all my work for almost a year—including interviewing for and hiring my teammates at the current job.
You will notice that my desk blocks the dresser door. It’s just as well—it’s not like I need blazers, suits, or ties these days!

Also, thank goodness for Zoom backgrounds! (I prefer astro images or Totoro for mine.)
My team starts the week with a 10AM sync on Monday.

How were our weekends? What are we working on this week, and what’s coming up? Anything holding us back? What are we looking forward to? Anyone taking days off?

So let me clean myself up and do that!
Read 6 tweets
15 Mar
Okay! I’ll kick off Monday AM with a “What’s machine learning?” thread.

I’ll cap off the weekend with a bonus mini-thread about my previous career in astrophysics!

I used to study the formation and mergers of supermassive black holes (SMBHs).
You might know black holes as what’s left behind when massive stars die.

These are “stellar mass BHs” (10s of solar masses, say).

Every galaxy seems to have a BH that’s millions to billions of solar masses. Those are SMBHs.
This is a series of infrared observations of the Milky Way center.

Using Kepler’s laws from physics 101, we know there’s 4 million solar masses of stuff at the common focus of these orbits.

Dr. Andrea Ghez shared a Nobel prize last year for this work.

images.app.goo.gl/UiJX9gBad4JJ8E…
Read 6 tweets
14 Mar
So let me start with the whether or not to part.

It’s a lot of work and a lot of responsibility to be accountable for the company’s entire DS practice, and for people’s jobs and professional growth. It’s not for everyone! There are weeks when I don’t get to code at all.
You don’t have to be a manager or dept head to grow your career. You can be principal, lead, or senior data scientist, and you can accept some mentoring responsibilities but fight away the managerial and strategic ones.
(Sometimes it isn’t clear you have this choice in small companies. But in my experience, small companies also offer the most flexibility for crafting your role & growth.)
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!