Big tech teams win because they have the best ML Ops. These teams
- Deploy models at 10x speed
- Spend more time on data science, less on engineering
- Reuse rather than rebuild features

How do they do it? An architecture called a Feature Store. Here's how it works
🧵 1/n Image
In almost every ML/data science project, your team will spend 90-95% of the time building data cleaning scripts and pipelines

Data scientists rarely get to put their skills to work because they spend most of their time outside of modeling Image
Enter: The Feature Store

This specialized architecture has:
- Registry to lookup/reuse previously built features
- Feature lineages
- Batch+stream transformation pipelines
- Offline store for historical lookups (training)
- Online store for low-latency lookups (live inferences) Image
You can think of the feature store as a "Feature API" made just for data scientists.

Anyone with access can view, pull, and contribute features for their own models. Over time, the feature store will eliminate countless hours of redundant feature engineering work Image
MLOps done right can supercharge your company

For example, the @NetflixEng team uses a ML architecture with FS-like capabilities called Metaflow. With Metaflow, data scientists can push models to production in 1 week or less on average

They have over 600+ models deployed today ImageImage
The feature store has 4 basic functionalities:

1. Feature Transform

This is the main tool for writing and saving features to your feature store. Typically, this takes the form of a job or service orchestration tool such as @ApacheAirflow

Basically: Read, Transform, Write Image
2. Feature Discovery

This is the hardest part to get right. If you want data scientists to reuse features, you need an intuitive UI that lets them search for them.

@databricks's feature registry has some basic components. But, there's ample room for opportunity for improvement Image
3. Feature Serving

The Offline Store is a historical data store feature discovery and model training

The idea behind the Online Store is rather than running feature transforms during inference (slow), you can pre-compute them and cache them in the online store for quick lookups Image
Want to use a feature store yourself? You're in luck! There's a few open source options out there

1. @feast_dev is a fantastic open source feature store that plays nicely with both GCP and AWS

2. @logicalclocks has an end-to-end ML pipeline with a feature store included called @hopsworks. Also open source
Here are what some (closed source) big players use:

@UberEng's Michaelangelo has an end-to-end feature engineering -> model training -> model deployment pipeline. Largely built around Spark's MLlib

@WixEng also has a nifty architecture that stores feature data with protobufs ImageImage
Want to buy instead of building your own? Here are some cool startups bringing feature stores to the market

1. @TectonAI - Staffed by some of the original Feast developers
2. @stream_sql - Founded by the minds behind Michaelangelo
3. @databricks - Feature store just left beta
And that's it :)

If you enjoyed it, I post threads like this on the regular. Also on topics ranging from AI for UI, fintech, crypto(skepticism), and data science

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alex Reibman

Alex Reibman Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AlexReibman

13 Oct
The #NobelPrize in economics was just awarded to 3 top economists. #EconTwitter seems to be over it, but the data science/ML community is totally missing out!

Here's why Data Scientists should start paying attention and what they can take away 🧵
The prize was awarded to David Card, @metrics52, and Guido Imbens for their monumental contributions to statistical methodology and causal inference.

They used and developed strategies that were a true paradigm shift bridging the gap between data and causation in economics Image
One part of the prize went to David Card from UC Berkeley.

Card is most well-known for his famous minimum wage study that paradoxically revealed that an increase in the minimum wage did *not* reduce employment. How?

The study applied a strategy called Difference in Differences Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(