I see so many startups whose business plan involves "monetizing the data". Sure, they have a product, and some revenue, but the *real* payoff is the data they're collecting. Or so they say.
PSA: it's not that easy. /1
Selling the data directly is almost always a non-starter. Repeatable, scalable, high-value data sales require a set of conditions that are exceedingly rare. /2
First, the data has to be reasonably comprehensive: covering enough of the domain of interest to be statistically significant and economically actionable. This is where most startup datasets fall down: they're simply not big enough. /3
Typically, you need at least 1,000,000s of events (transactions, searches, visits) or 100,000s of people (profiles, actions), or 10,000s of physical objects (products, locations, catalogs), or 1000s of assets (stocks, houses, contracts) to be useful. /4
Second, the data has to add unique value. This is different from being unique in itself. It's easy to create a unique dataset; the question is, does your dataset add marginal value (information, insight) that is different from what's already out there. /5
Note that most ecosystems have legacy data providers that, while not fantastic, are "good enough" for most applications, plus they have far more coverage than new entrants. So a new entrant's data needs to generate *substantially* better insight to displace those providers. /6
And that's hard. You can be broad and shallow, or narrow and deep, but it's hard to be both comprehensive and valuable. (This partly explains why legacy data businesses are so sticky). /7
Third, the data must have an application that's lucrative enough to generate sustainable economics. Trading is one such application, which is why hedge funds are often the buyers of first resort for many would-be data sellers. But that's the exception, not the rule. /8
Most applications have supply and demand curves that simply don't intersect. The classic example is consumer data: what a business will pay for the median consumer profile is an order of magnitude lower than what the median consumer would like to be paid. /9
Depending on the application, data buyers may also need fine granularity, or lengthy history, or rapid updates, and the dataset may fail if it doesn't satisfy those needs. /10
And even if you tick all these boxes, there are challenges. Defensibility is one: if it's easy, then anyone can do it, and your marginal economics goes to zero. So the best data businesses are built on data assets that are in fact very hard to acquire. /11
Distribution is a second. There's a well-known body of work on how to sell software. Very few people know how to sell data effectively. For that matter, very few people know how to buy data either. Often, you have to educate the market, which is slow and expensive. /12
Resellability is a third. Ideally your dataset becomes table stakes over time, so that every player in the ecosystem has to buy it. But to get there, it must first offer an advantage to early adopters, which implies exclusivity, and a very different set of economics. /13
Then there's data quality and delivery ops. Deploying data at scale can be as complex as deploying software, and is far less mature as a field. Most startups have no clue how to do it. /14
(I haven't even mentioned compliance, privacy, provenance, and data rights yet.) /15
For all these reasons, it's really really hard for a startup to build and sell data products as their core business. /16
But not impossible. I've seen startups sidestep these problems by identifying a new domain to collect data about, or finding a new (step-change) technology with which to collect data, or unlocking a novel data loop. These are rare! /17
How rare? At Quandl, we evaluated 1000s of would-be data sellers over the years; only a few 10s of them succeeded in making even a single sale. /18
Do you think you can beat those 1-in-100 odds? You'd better have convincing answers to all the questions raised above: coverage, uniqueness, economics, defensibility, resellability, distribution, deployment, compliance and more. /19
So if you can't monetize the data via third-party sales, how about internal monetization? Can the data you collect boost the economics of your own business? /20
Well, yes, and in fact the most successful businesses of the 21st century are built on precisely this kind of data learning loop. As a business grows, it collects data, which enables the business to perform better, and hence grow more, leading to more data. /21
But don't confuse the cart with the horse here. You can't just collect data and expect a business to follow; you need both sides of the equation. /22
It's one thing to say "as we scale, we'll collect data that will improve our product, giving us a compounding advantage". It's another thing entirely to say "the data we collect will enable us to create a product that nobody else has". The latter is much harder (and rarer). /23
And even if you do have a data learning loop in place, it really only kicks in with scale. All the caveats about coverage, uniqueness, economics and defensibility still apply; it's just that you're your own customer for the data. /24
(If you don't have broad coverage, or unique marginal insight, or feasible economics, or defensibility against copycats, well, your learning loop simply won't be effective.) /25
To sum up: monetizing data is hard! If you're a startup and it's key to your strategy, I'd urge you to think carefully and rigorously about what that entails. /26 & fin
As someone who was trading professionally (and successfully) in both 1999-00 and 2007-08, I have to say it *finally* feels like we're in the late stages of a bull market. I'm not talking about valuations or fundamentals; I'm talking about the zeitgeist. /1
The defining feature of late stage bulls is not price action; it's craziness. Think GameStop, and negative oil, and TikTok investors, and Davey. /2
This craziness is often driven by retail. Retail investors have more buying power, higher risk appetite, and fewer inhibitions than professionals. When retail enters the market, other investors get run over. /3
1/ It's been 6 months since the low point of US markets and economic activity. Ordinarily, we'd see the first academic papers on the COVID recession emerge right around now. But thanks to new sources of data, researchers are way ahead of schedule.
🧵THREAD👇
2/ Let's begin with spending. Chen et al use daily transaction data -- bank cards and QR code usage from UnionPay -- to track the decline of consumer spending across 214 cities in China, one of the earliest indicators of pandemic-induced changes: papers.ssrn.com/sol3/papers.cf…
3/ As early as March, the BEA was using credit card transactions processed by Fiserv to track COVID's impact on consumer spending in the US, per Dunn et al: bea.gov/system/files/p…
Kevin has a great 2x2 where he points out that most well-known VCs are in the "successful + brand-network-effect" quadrant, for obvious reasons -- they need the inbound deal flow. And Sutter Hill is interesting because it's in the "successful + low-profile" quadrant.
This is actually a quadrant I'm quite familiar with -- most hedge funds fall here! As a junior trader I was told: play dumb, stay quiet, keep a low profile, protect your edge, never reveal your positions or plans.
1/ Pricing curves for data are dramatically different from pricing curves for software, hardware, services, or consumer products. A thread of some things I've learned:
2/ Price per data point first increases, then decreases with quantity. Small datasets are usually worth less than big ones. But beyond a certain point, adding more data points doesn't add marginal information, and hence the price plateaus.
3/ Price first decreases, then increases with adoption. Unique datasets are worth more than commoditized ones, but once a dataset becomes "table stakes", price goes up again, especially if there's a single dominant supplier.
1/ When I was 13 years old, I spent 3 days in the hold of a converted cargo ship, escaping a war zone with nothing more than what I could carry in a small backpack.
🧵THREAD 👇
2/ Exactly 30 years ago, on August 2nd 1990, Saddam Hussein's army invaded Kuwait. I remember it clearly; I was there.
3/ My family was part of the massive Indian expat community. My father worked for the Kuwaiti ministry of health; my mother was a teacher. We had lived in Kuwait for 6 years.
You don't have to invoke magic or time travel to explain Renaissance Technologies and their amazing track record. First-mover advantage suffices. (1/N)
Consider this sequence of events:
- Rentec discovers a persistent source of mispricing in the market.
- Rentec trades *fast* to capture this mispricing.
- Rentec's activity forces prices to converge. (2)
To external observers, the mispricing never arises. Nobody even sees the opportunity; they all think "oh, that (slice of the) market is already efficient". Back-testing on this opportunity reveals no alpha to be captured. Everybody moves on. (3)