David Regalado Profile picture
Feb 26, 2022 9 tweets 10 min read Read on X
The evolution of data processing frameworks.

Knowing how these frameworks have evolved can help you understand the typical problems that arise, and how they're addressed.

As the Internet grew, Google invented new data processing methods.

🧵

#GCP #google @google @googlecloud
In 2002, Google created GFS, or the Google File System to handle sharding and storing petabytes of data at scale.

GFS is a foundation for cloud storage, and also for what would become BitQuery managed storage.

🧵

#GCP #google @google @googlecloud
One of the next challenges was to figure out how to index the exploding volume of content on the Web.

To solve this, in 2004 @Google invented a new style of data processing (MapReduce) to manage large scale data processing across large clusters of commodity servers.

🧵

#GCP
As Google's needs grew, they faced a problem of recording and retrieving millions of streaming user actions with high throughput.

That became Cloud Bigtable, which was an inspiration behind Edge Base or MongoDB.

🧵

#GCP #Google #GoogleCloud
One issue with MapReduce is that developers have to write code to manage all of the infrastructure of commodity servers.

Developers couldn't just focus on their application logic, so @Google started moving towards new tools.

Tools like Dremel.

#GCP #Google #GoogleCloud
Dremel became the query engine behind BigQuery.

Google continued to innovate to solve its big data and ML challenges, and created Colossus as the next generation distributed data store, Spanner as a planet scale relational database, ...

🧵

#GCP #Google #GoogleCloud @Google
... Flume and Millwheel for data pipelines, Pub/Sub for messaging, #TensorFlow for #MachineLearning, plus a specialized TPU hardware, and AutoML.

🧵

#GCP #Google #GoogleCloud @Google @TensorFlow
The good news is that Google has opened up these innovations as products and services for all of us to leverage as a part of the #GoogleCloudPlatform.

#GCP #Google #GoogleCloud @Google @TensorFlow @qwiklabs
@UnrollHelper unroll please 🤔

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with David Regalado

David Regalado Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @thecodemancer_

Feb 23, 2023
El ecosistema de ingeniería de datos evoluciona a altísima velocidad. Ya es tiempo de que subas de level y conozcas más allá de numpy, pandas y matplotlib.

🐍Abro hilo pythónico

🧵[1/x]

#python #dataengineering
Redpanda 🐼 : redpanda.com

Redpanda ofrece un performance superior a Apache Kafka y manteniendo la compatibilidad con el API.

¿Será tan poderoso como Google PubSub?

🧵[2/x]

#python #dataengineering
DuckDB 🦆 : duckdb.org

DuckDB nos permite hacer OLAP desde nuestro navegador web y tener un motor que funciona bastante bien con Parquet. MotherDuck motherduck.com está buscando ofrecer como Saas DuckDB a gran escala.

🧵[3/x]

#python #dataengineering
Read 8 tweets
Aug 3, 2022
What is the difference between a Data Engineer and a Data Architect?

🧵[1/x]
A data engineer looks at the immediate set of requirements and works towards that. In other words, data engineers build, rebuild, and tear down. ⚒

Need a new field in the report? Let's just build the whole thing. ⚒

🧵[2/x]
Data Architects think ahead in terms of capacity planning. X years from now, Y will happen, so we'll need to consider Z. In other words, Data Architects look at the full requirements and build it once.😎

This means less waste of money for the company in the long run.

🧵[3/x]
Read 6 tweets
Jul 20, 2022
"Blessing and misfortune are two sides of the same coin. One extreme can transform into another, and there is no right or wrong to this."

🧵[1/x]
During the Han Dynasty, an old man (Sai Weng) living on China’s border one day lost his horse. His neighbors all said what terrible luck that was, and sympathized with the old man. But Sai Weng said: “Maybe losing my horse is not a bad thing after all.”

🧵[2/x]
Lo and behold, the next day the old man’s horse returned, together with a beautiful female horse alongside him. All the neighbors exclaimed: “What great luck!” But the old man responded: “Maybe this is not such good luck after all.”

🧵[3/x]
Read 10 tweets
Jul 18, 2022
Did you notice this?

Some #GoogleCloud professional certificates on Coursera have off-platform certification exams. For a limited time, you can get a discount voucher for 20% off the cost of the exam.

This is a 🧵of links to those programs.

🧵[1/x]

@coursera @GoogleCloudTech
Google Cloud Digital Leader Training Professional Certificate lnkd.in/efnhSb57

🧵[2/x] Image
Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certificate lnkd.in/eiwqGYxt

🧵[3/x] Image
Read 7 tweets
Jul 5, 2022
💡 Methods for addressing overfitting.



🧵[1/x]

#MachineLearning #ML
1. Increase the number of training examples. I know, I know. Sometimes that's not possible.

🧵[2/x]

#MachineLearning #ML
2. Select a subset of the most relevant features
(👋 hello feature selection!).

🧵[3/x]

#MachineLearning #ML
Read 7 tweets
Jun 25, 2022
💡Seven ways to become a more effective founder

Credits to @GoogleStartups

#startups #founders

🧵[1/x]
🚨⚠️People issues are the biggest risk to funded startups.

55% of startups fail because of people problems, according to a study by Harvard, Stanford, and University of Chicago researchers.

🧵[2/x]
1. Minimize unnecessary micromanagement

Micromanaging can be helpful in certain situations, the most effective leaders aim to delegate work in order to scale both themselves and their businesses. Our data suggests that micromanaging can be a fatal flaw for CEOs.

🧵[3/x]
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(