Want to hear a secret?

Regardless of your experience, here is an area of Machine Learning where you can have a huge impact:

▫️ Feature Engineering ▫️

It sounds fancy because people love to complicate things, but let's make it simple: 🧵👇
In Machine Learning we deal with a lot of data.

Let's assume we are working with the information of the passengers of the Titanic.

Look at the picture here. That's what our data looks like.

The goal is to create a model that determines whether a passenger survived.

👇
Each one of the columns of our dataset is a "feature."

A Machine Learning algorithm will use these "features" to produce results.

"Feature engineering" is the process that decides which of these features are useful, comes up with new features, or changes the existing ones.

👇
The ultimate goal of feature engineering is to feed the Machine Learning algorithm with the best set of features that optimizes the results.

Here is the best part:

🔹Feature Engineering Is An Art 🔹

Your creativity plays a huge role here!

👇
I'm attaching the list of features again to this tweet.

Look at them, and let's talk about a couple of examples of how we could transform the data in a way that could produce much better results.

👇
1⃣ Age

Do you think the age of a passenger has any bearing on whether they survived or not? Probably yes, right?

Do you think that being 27 instead of 28 would make a difference? Probably not.

What matters here is the age group! A child vs. a young person vs. a senior.

👇
Instead of feeding our algorithm the age as it comes, we can transform the feature to reflect the age group.

The attached image is just a pseudocode showing a way to create the age group.

👇
2⃣ Ticket Number

Do you think that the ticket number will make a difference to determine whether a person survived the wreck?

I don't see how, so we can just drop this particular feature.

What you don't feed to your algorithm is as important as what you give to it.

👇
3⃣ Cabin

What do you think about the cabin number?

I think that we mostly care about whether or not the passenger had a cabin regardless of its number.

In this case, we could transform this feature in a simple 0 (doesn't have a cabin) or 1 (does have a cabin.)

👇
Another way to look at this cabin feature:

If the cabin number encodes the floor in the ship where the cabin was located, it would be interesting to retain that information.

(I'm assuming that people on the top floors had a better chance of survival.)

👇
4⃣ sibsp and parch

These two features tell us the number of siblings + spouses and parents + children respectively.

Does this matter?

Well, here is a theory: people with family on the ship might have had a different chance of survival than those that were traveling alone.

👇
Maybe some people couldn't save themselves because they had to protect others.

Maybe some people survived because others protected them.

Either way, we could combine these two features into a single 0 (no family) or 1 (family) value.

👇
Do you get the idea of how the process works?

We started with a set of features, and step by step we have transformed it into something that may give the algorithm a better chance to make good predictions.

This is feature engineering.

👇
Regardless of who you are, and how much experience you have building Machine Learning, "feature engineering" is an area that can get you started in the field.

Here, the best is not whoever knows the most, but whoever is capable of thinking creatively.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago 🎃

Santiago 🎃 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

8 Oct
A real-life Machine Learning solution that works.

Here is a breakdown of every piece and how they work together.

🧵👇 Image
There's a website.

Users upload a group of pictures of an item and select the category it belongs to.

The system returns how much money the item is worth.

👇
Before:

▫️A group of people reviewed the pictures submitted by the user and decided how much the item was worth.

Today:

▫️The system quotes some of the items automatically (some still have to go through humans for a quote.)

👇
Read 11 tweets
6 Oct
I don't have proof, but I have empirical evidence that this is true:

▫️The outcome of a pair programming session is directly proportional to each developer's capacity to challenge each other.

Let me explain: 🧵👇
If you pair 2 developers with very different seniority levels, the session will become more of a training opportunity for the least senior person.

The short-term impact on the project will be negligible. Most of the ideas and progress will come from the senior person.

👇
If you pair two developers with a similar experience, their contributions multiply, giving you a much larger short-term impact.

You aren't getting ideas from one or the other anymore. You are getting a polished version that's better than any idea individually.

👇
Read 4 tweets
6 Oct
There are different categories of Machine Learning problems:

▫️Supervised Learning
▫️Unsupervised Learning
▫️Semi-supervised Learning
▫️Reinforcement Learning

This is a quick introduction to each one of them: 🧵👇
1⃣ Supervised Learning

🔹We train an algorithm using labeled data. This means that we give it the "questions" and the correct "answers."

The goal is for the algorithm to learn the concepts, so it can later answer similar questions.

👇
An example of Supervised Learning:

Given a dataset with pictures of different dogs and their breed, we can use a classification algorithm to determine the breed of new pictures of dogs.

Noticed how here we are getting labeled data (picture + breed.)

👇
Read 14 tweets
5 Oct
Python 3.9 🐍 is out! 🥳

Here are the 5 new features you care about.

🧵👇
1⃣ Merging dictionaries

There's a new operator "|" that can be used to merge two dictionaries.

See PEP 584 for more information: python.org/dev/peps/pep-0…

👇 Image
2⃣ Updating dictionaries

Another new operator "|=" will let you update dictionaries.

See PEP 584 for more information: python.org/dev/peps/pep-0…

👇 Image
Read 7 tweets
5 Oct
Some applications of Machine Learning:

▫️Ranking
▫️Recommendation
▫️Classification
▫️Regression
▫️Clustering
▫️Anomaly Detection

Here is a 3-second description of each one them: 🧵👇
1⃣ Ranking

🔹 Help your users find the most relevant items they are looking for.

For example, Google's algorithm to rank search results when a user searches for something combines multiple signals to offer the best results: your location, interests, past searches, etc.

👇
2⃣ Recommendation

🔹 Give your users the items they may be most interested in.

For example, Netflix's recommendation system to suggest what to watch based on your preferences, genre watch time, ratings, location, etc.

👇
Read 8 tweets
4 Oct
My recommendations for your first 30 days of Python 🐍.

🧵👇
I get many messages, and the most frequent question by far is, "How do I start with Python?"

There are multiple ways. Every one as valid as the one before.

Here is my way. These are my recommendations.

👇
1⃣Before anything else, remember that you need to make a commitment and be consistent.

Dedicate time to learn every day. It doesn't matter how much. Find a time that works for you.

The hashtag #100DaysOfCode is a great way to share your progress and stay accountable.

👇
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!