Jamin Ball Profile picture
19 Jan, 25 tweets, 6 min read
A trend I'm excited for this year: DataOps & the Analytical Engineer

~10 years ago DevOps was born. The role of system admins and developers merged. Infrastructure became self-serve

Today the role of data engineers and business analysts are merging. Data is becoming self-serve
Data infrastructure is becoming so powerful that the tools today allow non-technical folks to carry out the once complicated / custom code/ huge backlog jobs of data engineers.

Before getting into what this means, let's first discuss how we got here
Before 2012 the data world was dominated by transactional (OLTP) databases like PostgreSQL, MySQL, etc and analytical (OLAP) databases like Oracle, Netezza

Tools like Informatica / Talend were used to batch load (ETL) data into these databases, Tableau used to visualize
As you can imagine, there was heavy engineering work to manage the environment...

Then in 2013 AWS released their cloud data warehouse Redshift, and it was a game changer. Snowflake was founded in 2012, but didn't really pick up steam until a few years later (around 2016)
So why was Redshift a big deal?

1. It was the first cloud-native OLAP warehouse. It reduced the TCO of an OLAP database by orders of magnitude.

2. Speed of processing analytical queries increased dramatically
3. And later on (Snowflake pioneered this) they separated compute & storage. In overly simplified terms, this meant customers could scale their storage and compute resources independently of one another. This was a huge deal

What does this all mean? An EXPLOSION of data
The barriers to maintain a database were completely broken down, and the amount of data that was sent to Redshift / Snowflake / BigQuery skyrocketed.

Now, after this point we still weren't ready for DataOps / Analytical Engineers. What did it take to get there?
In my opinion there were 3 major technologies / shifts that happened that have given rise to DataOps:
1. The shift from ETL to ELT (extract-transform-load to extract-load-transform). Data used to be transformed (joined, aggregated, cleaned, etc) in motion while being loaded into the warehouse. Now, data is being loaded into the warehouse in it's raw form..
...Why is this important? In an ETL process if something goes wrong it's very hard to debug if the issue happened in the "T" or the "L". It was also harder to build these pipelines. With ELT, tools like @fivetran allow you to point & click to connect source data to your warehouse
The big trend here? The data warehouse is starting to subsume the data lake, and the default is becoming: "just send all data to Redshift / Snowflake." Again, the barriers of storing / collecting data is going way down
2. The importance of the cloud data warehouse. We already talked about this, but the one incremental point I want to make is the power of the compute within the warehouse went way up, and the cost of that compute went way down...
...this is fundamentally what enabled the "T" in ELT to happen within the warehouse. The compute horsepower of a Snowflake / Redshift made it possible
3. So who's driving these transformations? Tools like @getdbt @fishtowndata. The big technology advancement of the open source project dbt was representing these data transformations as code (SQL). It allowed anyone who knows SQL (business analysts) to author the transformations
Prior, the transformations were done with custom Python code by data engineers, or GUI based ETL tools. These took forever to build, were inflexible / hard to scale, and a black box
So in summary, the major platforms enabling the rise of DataOps are:

1. Data Movement: @fivetran
2. Data storage / compute: @SnowflakeDB @awscloud @GCPcloud
3. Data Transformations: @getdbt
If you think about what these technologies allow: to get data into a warehouse you just point and click data sources to their destination. The connectors are pre-built. You don't have to manage an on-prem database. And the data transformations are represented as basic SQL
And here is my KEY point (I get I've buried the lead a bit, but I believe the setup is important): Data Engineers used to manage all of the complexities of moving, storing, and transforming data. A lot of it was built with custom code / Python, and managed MANUALLY 🤯
Today, with the powerful tools I listed above, the data ecosystem can be managed with turnkey tools, removing the need for a lot of the complex work the data engineers handled previously....
...Instead of the BI analysts requesting the data engineer build them a pipeline of data, they can point & click with Fivetran and get data loaded into Snowflake, and write some SQL with dbt to get a materialized view (subset of data) to efficiently run their query against
After quite the lead in I hope that it's clear now how the business analysts (like BI analysts) can now carry out the functionality of data engineers to access data in a turnkey, self-serve manner. I'm describing these analysts as the "Analytica Engineer" and the process DataOps
So why is this important, and why am I excited about DataOps this year?

When data access becomes democratized and self-serve in nature, the need for new tools to manage this "modern data stack" goes up. I think we'll see a TON of huge companies built in the following categories:
1. I think we'll see the data catalogue / lineage systems re-invented. To access data you need to know where it's located / where it came from

2. Monitoring the quality of the data (and pipelines) will become increasingly important in self-serve settings. "Datadog for data"
3. A wave of modern BI tools

4. Governance becomes a bigger deal. With self-serve access, how can you govern who should have access to what?

5. Metadata gains increased importance

6. The way data pipelines were orchestrated (ie Airflow) will completely evolve
6. The "endpoints" for data coming out warehouses will grow. Right now the two primary places data goes from the warehouse are BI ML tools. I think we'll see an explosion of data going into new places, like back into SaaS apps

7. And many more we can't even imagine now

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jamin Ball

Jamin Ball Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jaminball

10 Dec 20
My biggest takeaway from Q3 cloud earnings? We REALLY saw cloud businesses ACCELEERATE. Since Covid began we heard anecdotal data of "digital transformations accelerating." But the data was never there. It is now. Data below shows the absolute change in rev growth % from Q3 to Q2
For further clarification - the graph shows the delta between Q3 YoY growth rate and Q2 YoY growth rate (I tried to normalize for acquisitions where I could, sure I missed some). As an example - Zoom grew 367% in Q3 and 355% in Q2, so the delta, 12%, is graphed.
I'm defining "accelerating" as YoY rev growth that is increasing on an absolute basis. And as you can see, there are plenty of businesses who accelerated this quarter
Read 14 tweets
1 Oct 20
Great report from Morgan Stanley today on the Identity Access Management market.

They claim an increasingly cloud-based and distributed workforce shifts security control towards Identity & Access Management, driving a considerably larger >$30B TAM vs. industry estimates 👇
IAM is the top CSO priority: Image
Their bottoms up TAM is MUCH LARGER than IDC. They believe the shift to cloud greatly expands the IAM market Image
Read 5 tweets
30 Sep 20
Some cool new Twilio products announced today at their Signal conference. As engagement channels shift from physical to digital Twilio has gotten a boost. 500% growth in Twilio Video usage, and total messages 2x first half of this year. More announcements below $TWLO
1) Twilio Frontline: a mobile application that allows field workers to seamlessly and securely engage directly with customers from their personal devices

2) Twilio Video Web RTC Go: a free toolkit that eliminates the complexity of building on top of WebRTC for video messaging
3) Twilio Flex Ecosystem: since its launch on the SIGNAL stage in 2018 Twilio Flex has added more than 100 new features. The Flex Ecosystem gives organizations access to more than 30 validated partner solutions from partners such as Google, Salesforce, Zendesk, and Calabrio
Read 4 tweets
24 Sep 20
A few weeks ago I shared a graphic looking at the change in YoY growth rates for SaaS businesses from Q2 to Q1. I thought another interesting analysis would be looking at the change in net new ARR added from Q2 to Q1. The data below shows the % change: Image
To calculate net new ARR in a given quarter I first take the quarterly subscription revenue (where disclosed) and multiply it by 4 to get an implied ARR metric. I do the same thing for the quarter prior. The difference between the 2 is the implied net new ARR added in a quarter
What's graphed is the % change in net new ARR added in Q2 vs Q1.

An example: Fastly added $48.3M net new ARR in Q2 and $15.8M of net new ARR in Q1. The number shown is the growth in net new ARR 205%. (shoutout to my Fastly bulls)
Read 15 tweets
23 Sep 20
Zuora's annual subscription economy report it out! Some takeaways:

1. Subscription companies continue to outperform their product-based peers by wide margins, growing revenues approximately 6X faster than S&P 500 companies (17.8% versus 3.1%)
2. Subscriber growth took a big dip in Q1 as the pandemic started, but rebounded in a big way in Q2
3. Revenue per user slows slightly. Overall for the SEI, growth in average revenue per account has slowed compared to the end of 2019, in some cases representing users who refrained from upgrading services in an economic downturn + more discounts
Read 7 tweets
19 Sep 20
Looking at the share price reaction for SaaS businesses the day after reporting Q2 earnings makes it appear like most stocks fell (the median stock fell 2.5%). However if you look at how the share price after earnings compared to 2 weeks prior it paints a different picture! Data: Image
This is particularly true for businesses with July Q ends who saw big run ups in share price in the couple days leading up to earnings, then big drops right after. Fastly (June Q end) was the epitome of this. Down 19% the day after earnings, but up 17% compared to 2 weeks prior!
Here's the data for the day after share price reaction for comparison: Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!