I was an eng leader on Facebook’s NewsFeed and my team was responsible for the feed ranking platform.

Every few days an engineer would get paged that a metric e.g., “likes” or “comments” is down.

It usually translated to a Machine Learning model performance issue. /thread
2/ The typical workflow to diagnose the alert by the engineer was to first check our internal monitoring system Unidash to see if the alert was indeed true and then dive into Scuba to diagnose it further.
3/ Scuba is a real-time analytics system that would store all the prediction logs and makes them available for slicing and dicing. It only supported filter and group by queries and was very fast.

research.fb.com/wp-content/upl…
4/ Engineers would load up the Scuba dashboard for a given time window and start slicing data on a variety of attributes.

For example - Are likes down for all types of news feed stories? Are they down only within a particular country or region?
5/ If I were on-call and I got an alert that likes dropped by a stat-sig amount, the first thing I would do is to go into Scuba.

I will zoom into likes last day and compare it with last week, and add filters like country, etc to find out which slice has the biggest deviation.
6/ Most Machine Learning model performance issues occurred due to data pipeline issues.

For example, a developer introduced a bug in logging and that is sending bad feature data to the model or a piece of the data pipeline is broken because of a system error.
7/ Another set of issues were due to ML models that were not updated for a while and user behavior has changed.

This usually resulted in the On-call opening a ticket for the model owner to retrain the model.
8/ Facebook had continuous retraining of some models and these models had challenges with reproducibility as they would get updated every few hours.
9/ Another big use-case of Scuba was to do challenger champion testing.

Engineers would run lots of A/B tests and use Scuba metrics to figure out which model is performing the best before they bring that model dashboard for a launch review.
10/ Finally all of this was enabled by a fantastic set of explainability tools that help us debug models both during experimentation and production timeframe.

Some of these tools were integrated into internal and external versions of the Facebook app.
about.fb.com/news/2019/03/w…
11/ Monitoring, analysis, and explainability of Models are a must-have for teams that want to operationalize ML at scale and in a trustworthy manner.

This is why we created @fiddlerlabs. #MLOps #Monitoring #ExplainableAI

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Krishna Gade

Krishna Gade Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @krishnagade

25 Nov 19
With the last week's launch of Google Cloud’s Explainable AI, the conversation around #ExplainableAI has accelerated.

But it begs the questions - Should Google be explaining their own AI algorithms? Who should be doing the explaining? /thread
2/ What do businesses need in order to trust the predictions?

a) They need explanations so they understand what’s going on behind the scenes.

b) They need to know for a fact that these explanations are accurate and trustworthy and come from a reliable source.
3/ Shouldn't there be a separation between church and state?

If Google is building models and is also explaining it for customers -- without third party involvement -- would it align with the incentives for customers to completely trust their AI models?
Read 11 tweets
10 Oct 19
We've been working on #ExplainableAI at @fiddlerlabs for a year now, here is a thread on some of the lessons we learned over this time.
2/ There is no consensus on what "Explainability" means. And people use all of these words to mean it.
3/ However, one thing is clear:

"AI is a Black-Box and people want to look inside".

The reasons to look inside vary from Model Producers (Data Scientists) to Model Consumers (Business teams, Model validators, Regulators, etc).
Read 14 tweets
19 Sep 19
It is amazing to see so many applications of game theory in modern software applications such as search ranking, internet ad auctions, recommendations, etc. An emerging application is in applying Shapley values to explain complex AI models. #ExplainableAI
Shapley value was named after its inventor Lloyd S. Shapley. It was devised as a method to distribute the value of a cooperative game among the players of the game proportional to their contribution to the game's outcome.
Suppose, 10 people came together to start a company that produces some revenue. How would you distribute the revenue of the company among the 10 people as a payoff so that the payoffs are fair and appropriate to their contributions?
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!