David Regalado Profile picture
- VP of Engineering @ Stealth Startup - Founder @DataEngiLatam - Mentor #ArtificialIntelligence #AI #datascience #dataengineering

Feb 26, 2022, 9 tweets

The evolution of data processing frameworks.

Knowing how these frameworks have evolved can help you understand the typical problems that arise, and how they're addressed.

As the Internet grew, Google invented new data processing methods.

🧵

#GCP #google @google @googlecloud

In 2002, Google created GFS, or the Google File System to handle sharding and storing petabytes of data at scale.

GFS is a foundation for cloud storage, and also for what would become BitQuery managed storage.

🧵

#GCP #google @google @googlecloud

One of the next challenges was to figure out how to index the exploding volume of content on the Web.

To solve this, in 2004 @Google invented a new style of data processing (MapReduce) to manage large scale data processing across large clusters of commodity servers.

🧵

#GCP

As Google's needs grew, they faced a problem of recording and retrieving millions of streaming user actions with high throughput.

That became Cloud Bigtable, which was an inspiration behind Edge Base or MongoDB.

🧵

#GCP #Google #GoogleCloud

One issue with MapReduce is that developers have to write code to manage all of the infrastructure of commodity servers.

Developers couldn't just focus on their application logic, so @Google started moving towards new tools.

Tools like Dremel.

#GCP #Google #GoogleCloud

Dremel became the query engine behind BigQuery.

Google continued to innovate to solve its big data and ML challenges, and created Colossus as the next generation distributed data store, Spanner as a planet scale relational database, ...

🧵

#GCP #Google #GoogleCloud @Google

... Flume and Millwheel for data pipelines, Pub/Sub for messaging, #TensorFlow for #MachineLearning, plus a specialized TPU hardware, and AutoML.

🧵

#GCP #Google #GoogleCloud @Google @TensorFlow

The good news is that Google has opened up these innovations as products and services for all of us to leverage as a part of the #GoogleCloudPlatform.

#GCP #Google #GoogleCloud @Google @TensorFlow @qwiklabs

@UnrollHelper unroll please 🤔

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling