Data Science Applications @Shopify

In this thread I'll highlight some important pieces from a variety @ShopifyData & @ShopifyEng blogs where they discuss applications they've built that I think would benefit Data Scientists.

1. How Shopify Capital Uses Quantile Regression To Help Merchants Succeed
2. How to Build an Experiment Pipeline from Scratch
3. How to Use Quasi-experiments and Counterfactuals to Build Great Products
4. Categorizing Products at Scale
Other Threads On Shopify DS Applications:
- How Shopify Uses Recommender Systems to Empower Entrepreneurs:
- Measuring Churn and CLV:
- The Evolution of Kit: Automating Marketing Using Machine Learning:
1 Quantile | How Shopify Capital Uses Quantile Regression To Help Merchants Succeed
by Kyle Tate

"Shopify Capital provides funding to help merchants on Shopify grow their businesses. But how does Shopify Capital award these merchant cash advances?"…
1.1 Quantile | Motivation
When giving out a loan the most important thing to consider is the probability that it will be paid back.

To determine that, you want to know the merchant's future sales.

The problem with regression for this problem is it won't account for uncertainty
1.2 Quantile | Motivation
Expected sales at $10k +- $1k is very different than
Expected sales at $10k +- $10k

This is where Quantile regression can help, by predicting the exact quantile of interest.

For ex: predict such that merchant would have a 90% chance of being above.
1.3 Quantile | Implementation

A lot of boosting models have this option built in, for ex sklearn:…

In a neural net it's just a simple change to the loss function.

More details here:
1.4 Quantile | Understanding

Notice that unlike a standard prediction interval which would be derived as a static above and below your predictions.

The band size can change appropriately based on the input features, for ex if seasonality causes changes to the variance of errors
1.5 Quantile | Extension

In addition to using this when you want to represent a specific quantile of interest (like above).

This can also be useful to predict the prediction bands around your forecasts for a variety of applications.
2 XP | How to Build an Experiment Pipeline from Scratch

"One of the most compelling ways to prove the value of any decision or intervention ... is to run an A/B test. But what if that wasn’t an option on your current stack?"…
by Mojan Hamed
2 XP | How to Build an Experiment Pipeline from Scratch

This post is great walkthrough on steps to build a robust piece of software from scratch that is going to satisfy all the business requirements.
2.1 XP | Intro

There was no experimentation platform for email experiments

So the DS manually managed them, which had problems:
1. Local storage
2. Didn't account for: user unsubscription or many-many relations between emails and shops
3. Experiments can leak into each other
2.2 XP | Problem
"We define the problem as: given a list of visitors, we want to randomize so that each person is limited to one experiment at a time, and the experiment subjects can be fairly split among data scientists who want to test on a portion of the visitor pool."
2.3 XP | Diagram
2.4 XP | Plan Ideal Output

She built a table w 1 row per: email/shop/experiment + additional info around timing and theme.

Once v0 ideal output is made she considered example use cases and queried the fake table, to identify gaps.
2.5 XP | Build Requirements

She sat down w stakeholders in a guided exercise where she got them to query her fake table to ensure the structure can support their needs.
2.5 XP | Build Requirements
Requirements that resulted:
1. Exclude subjects from other experiments
2. Include exp tags
3. Exclude linked shops: shops linked to a given email get excluded
4. On-going randomization: assign new users as they qualify over time
5. Backfill past exp
2.6 XP | Tech Planning

phase1: "create an experiment definition file that defines the criteria for candidates in the form of a SQL query"

ph2: "many-to-one transform stage that consolidates all incoming experiments into a single output"

ph3: filter down to satisfy requirements
2.7 XP | Key Takeaways
3 Cause | How to Use Quasi-experiments and Counterfactuals to Build Great Products

"At Shopify, we believe that understanding causality is the key to unlocking maximum business value."…
3.1 Cause | Intro
"We aim to identify insights that actually indicate why we see things in the data, since causal insights can validate (or invalidate) entire business strategies. Below I’ll discuss different causal inference methods and how to use them to build great products."
3.2 Cause | Levels of Evidence
3.3 Cause | A/B Tests
The gold standard for causal inference.

The environment each group is placed in needs to be identical besides one parameter: the treatment.

Gotchas: self-selection to participate in test
3.4 Cause | A/B Tests From Scratch
You'll need:
1. way to randomly assign units to the right group
2. tracking mechanism to collect data for all relevant metrics
3. to analyze those metrics and associated stats to compute effect sizes and validate causal effects
I've run out of room :)
I will be sending out the second half of this thread in the coming days.
3.4 Cause | Can't Use A/B
Sometimes it's not possible to do an A/B test.

For ex:
1. Lack of tooling
2. Lack of time
3. Ethical concerns (not fair to leave some merchants out)
4. Not possible (ex: want to compare to a historical launch)
3.5 Cause | Quasi-Experiments
(2nd best)

"treatment and control group are divided by a natural process that isn’t truly random, but are considered close enough to compute estimates"

Very common in product companies.
3.5 Cause | Quasi-Experiment Ex 1
"feature rollout happens at different dates in different countries"

You can use a quasi-experiment tool to evaluate the expected lift from the feature, by comparing to the sales in the country w out the feature
3.5 Cause | Quasi-Experiment Ex 2
"new feature is dependent on the behaviour of other features (like in the case of a deprecation)"

Where your treatment is going to get this feature (bc of their behaviour) and the control group is not

But you need to account for their behaviour
3.6 Cause | 2 Quasi-XP Methods
(Both used at Shopify)

1. linear regression w fixed effects
- Assumes: we have the data on all factors that divide the individuals between treatment and control
- Then: lin reg on metric of interest controlling for these factors
3.6 Cause | 2 Quasi-XP Methods

2. diff in diffs (very popular)
- Assumes: have a control group that shows a trend parallel to your treatment on the metric of interest, prior to treatment
- Then: after treatment assume the diff in these diffs is from treatment
3.6 Cause | Counterfactuals
(3rd best)

"want to try to detect causal factors from data that only consists of observations of the treatment.
A classic example in tech is estimating the effect of a new feature that was released to all the user base at once"
3.6 Cause | Counterfactuals Approach

"create a model that allows you to compute a counterfactual control group.
In other words, you estimate what would happen had this feature not existed."

Difficult part is developing a robust model about your users.
3.6 Cause | Counterfactuals Ex

Security updated introduced friction for users.
Wanted to see if this decreased usage.

Built time-series model to estimate usage of the updated feature using info around features not affected by update and global trends around activity.
3.7 Cause | Robustness

Quasi-xp and counterfactuals make it much harder to compute sensible confidence intervals, they increase uncertainty and risks of FPs.

Robustness checks can help falling into FP traps.
3.8 Cause | Robustness Checks

Gradually relaxing each assumption your model/data relies on and seeing if your results still hold.

If your finding drastically change due to a single variable you should be skeptical, especially if that variable is subject to noise.
3.9 Cause | Robustness DAGs

"Direct Acyclic Graphs (DAGs) are a great tool for checking robustness. They help you clearly spell out assumptions and hypotheses in the context of causal inference."

Great post on causal DAGs by @Cmrn_DP:…
3.10 Cause | Robustness Dagitty

"In a nutshell, when you draw an assumed chain of causal events in Dagitty, it provides you with robustness checks on your data, like certain conditional correlations that should vanish."

3.11 Cause | 3 Points About Causal Inference

1. A/B tests should go in every DS toolbox
2. When A/B not possible look for natural experiments to replace them
3. If no natural experiments can be found counterfactuals can be useful, but you shouldn't expect to detect weak signals
3 Cause | Attribution
And forgot to mention above that Antoine Rebecq was the one to author this excellent post around causality.
2.8 XP | References

Also want to say thanks to @MojanBenham for the great post:
How to Build an Experiment Pipeline from Scratch

Great example on how to build a robust pipeline from scratch, and gives great insights into what should be considered when building 1 4 experiments

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Brydon Parker

Brydon Parker Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @parker_brydon

19 Feb
The Evolution of Kit: Automating Marketing Using Machine Learning - @ShopifyData and @ShopifyEng

This thread will pull some interesting elements out of…
by @vincentchio

1. Intro
2. Motivation
3. #ML
4. New Businesses

1 Intro
"As a virtual assistant, Kit interacts with business owners through messages over various interfaces including Shopify Ping and SMS."

Kit serves as a nice UI to make ads and helps them
"create more effective and performant ads through marketing recommendation"
2 Motivation
Initial rule-based recommendations had the budget ranges hard coded into the application where the user can can choose from.

But these may not fit their needs and it's a difficult decision to make in order to maximize returns.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!