Tweet

David Regalado

Jun 24 • 15 tweets • 7 min read

📣Data Engineering Projects for Beginners 2022

👇🧵[1/x]

#dataengineering #python #Docker #developers #aws #GoogleCloud #apacheairflow

Tracking your Uber Rides and Uber Eats expenses through a data engineering process

Technologies and skills:
Python, Docker, Apache Airflow, AWS Redshift, Power BI, data modelling, Task schedulling, ETL and ELT processes, Data warehousing, Cloud

🧵[2/x]

github.com/Wittline/uber-…

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

Technologies and skills:
Python, Docker, Big Data, Cloud, Google Cloud, Redis, DAG, Parallel Processing, Apache Spark

🧵[3/x]

github.com/Wittline/pyDag

Building Big Data Pipelines in the Cloud with AWS EMR

Technologies and skills:
Python, PySpark, AWS EMR, Task Schedulling, IAC, EC2 Instances, Apache Spark, Cloud

🧵[4/x]

github.com/Wittline/pyspa…

Building a Lossless Data Compression and Data Decompression Pipeline

Technologies and skills:
Python, Data compression, BZIP2, Parallel programming

🧵[5/x]

github.com/Wittline/wbz

Learn how to dockerize an Apache Spark Standalone Cluster

Technologies and skills:
Python, Jupyter Notebook, Apache Spark, Docker, docker-compose, Hive

🧵[6/x]

github.com/Wittline/apach…

Dockerizing and Consuming an Apache Livy environment

Technologies and skills:
Python, Big Data, Docker, docker-compose, Apache Livy, Apache Spark, PostgreSQL, PySpark, Jupyter Notebook

🧵[7/x]

github.com/Wittline/docke…

Design, Development and Deployment of a simple Data Pipeline

Technologies and skills:
Python, data Modelling, Docker, docker-compose, PostgreSQL, data pipeline, FastApi

🧵[8/x]

github.com/Wittline/data-…

Dockerizing a Python Script for Faster Web Scraping

Technologies and skills:
Python, Docker, Sqlite, Dockerfile, Web scraping, Data pipeline, FastApi

🧵[9/x]

github.com/Wittline/data-…

Understanding Similarity Measures for Text Analysis

Technologies and skills:
Python, Machine Learning, Similarity measures, Distance metrics, Text Analysis

🧵[10/x]

github.com/Wittline/dista…

Learn how to build a content-based Movie Recommender System

Technologies and skills:
Python, Machine Learning, TF-IDF, Cosine similarity, BM25, BERT, NLP, word2vec, Text Analysis, recsys

🧵[11/x]

github.com/Wittline/recom…

A Text Analysis of Speeches

Technologies and skills:
Python, Machine Learning, NLP, word2vec, Text Analysis, Sentiment Analysis, PCA, t-SNE, Word Embeddings, Text Preprocessing, Web scraping, Data Visualization

🧵[12/x]

github.com/Wittline/text-…

Dropout Students Prediction

Technologies and skills:
R, Genetic algorithm, Neural Networks, K-Means, Clustering, Machine Learning

🧵[13/x]

github.com/Wittline/Dropo…

@RamsesCoraspe

Credits: @RamsesCoraspe

🧵[14/x]

dev.to/ramsescoraspe/…

@thecodemancer_

I tweet about all things data related. Follow me for more content.

@thecodemancer_

🧵[15/x]

linkedin.com/in/davidregala…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 11 tweets

David Regalado

@thecodemancer_

Mar 27

Can you imagine serverless Spark + BigQuery together? 🤯

Forget about managing clusters and tuning infrastructure if your job is to focus on create business value.

👇

🧵1/6

#googlecloud #bigquery #spark #dataengineering

Why Serverless Spark?

💡 Developers can focus on code and logic. They do not need to manage clusters or tune infrastructure. They submit #Spark jobs from their interface of choice, and processing is auto-scaled to match the needs of the job.

🧵2/6

#googlecloud #bigquery #gcp

💡 Data engineering teams do not need to manage and monitor infrastructure for their end users. They are freed up to work on higher value #dataengineering functions.

💡 Pay only for the job duration, vs paying for infrastructure time.

🧵3/6

#googlecloud #bigquery #spark

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

David Regalado

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @thecodemancer_

David Regalado

David Regalado

David Regalado

David Regalado

David Regalado

David Regalado

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?