Discover and read the best of Twitter Threads about #mlOps

Most recents (24)

What is a correct Data Engineering Learning Path?

My thoughts in the ๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
I believe that the following is a correct order to start in ๐—ฌ๐—ผ๐˜‚๐—ฟ ๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐—ฃ๐—ฎ๐˜๐—ต:

๐Ÿ‘‡
โžก๏ธ ๐—จ๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐—ป๐—ฑ ๐—•๐—ฎ๐˜€๐—ถ๐—ฐ ๐—ฃ๐—ฟ๐—ผ๐—ฐ๐—ฒ๐˜€๐˜€๐—ฒ๐˜€:

๐Ÿ‘‰ Data Extraction
๐Ÿ‘‰ Data Validation
๐Ÿ‘‰ Data Contracts
๐Ÿ‘‰ Loading Data into a DWH / Data Lake
๐Ÿ‘‰ Transformations in a DWH / Data Lake
๐Ÿ‘‰ Scheduling

๐Ÿ‘‡
Read 8 tweets
What are the basics of Writing Data to a Kafka Topic?

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
Kafka is an extremely important ๐——๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ฒ๐—ฑ ๐— ๐—ฒ๐˜€๐˜€๐—ฎ๐—ด๐—ถ๐—ป๐—ด ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ to understand as it was the first of its kind and most of the new products are built on the ideas of Kafka.

๐—ฆ๐—ผ๐—บ๐—ฒ ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐—น ๐—ฑ๐—ฒ๐—ณ๐—ถ๐—ป๐—ถ๐˜๐—ถ๐—ผ๐—ป๐˜€:

๐Ÿ‘‡
โžก๏ธ Clients writing to Kafka are called ๐—ฃ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ฒ๐—ฟ๐˜€,
โžก๏ธ Clients reading the Data are called ๐—–๐—ผ๐—ป๐˜€๐˜‚๐—บ๐—ฒ๐—ฟ๐˜€.
โžก๏ธ Data is written into ๐—ง๐—ผ๐—ฝ๐—ถ๐—ฐ๐˜€ that can be compared to ๐—ง๐—ฎ๐—ฏ๐—น๐—ฒ๐˜€ ๐—ถ๐—ป ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฎ๐˜€๐—ฒ๐˜€.

๐Ÿ‘‡
Read 8 tweets
So what is the difference between Row Based and Column Based file formats?

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning
๐—ฅ๐—ผ๐˜„ ๐—•๐—ฎ๐˜€๐—ฒ๐—ฑ:

โžก๏ธ Rows on disk are stored in sequence.
โžก๏ธ New rows are written efficiently since you can write the entire row at once.

๐Ÿ‘‡
โžก๏ธ For select statements that target a subset of columns, reading is slower since you need to scan all sets of rows to retrieve one of the columns.

๐Ÿ‘‡
Read 8 tweets
What are the main use cases for Apache Kafka or any other Distributed Messaging System?

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience Image
We have covered lots of concepts around Kafka already. But what are the most common use cases for The System that you are very likely to run into as a Data Engineer?

๐—Ÿ๐—ฒ๐˜โ€™๐˜€ ๐˜๐—ฎ๐—ธ๐—ฒ ๐—ฎ ๐—ฐ๐—น๐—ผ๐˜€๐—ฒ๐—ฟ ๐—น๐—ผ๐—ผ๐—ธ:

๐Ÿ‘‡
๐—ช๐—ฒ๐—ฏ๐˜€๐—ถ๐˜๐—ฒ ๐—”๐—ฐ๐˜๐—ถ๐˜ƒ๐—ถ๐˜๐˜† ๐—ง๐—ฟ๐—ฎ๐—ฐ๐—ธ๐—ถ๐—ป๐—ด.

โžก๏ธ The Original use case for Kafka by LinkedIn.
โžก๏ธ Events happening in the website like page views, conversions etc. are sent via a Gateway and piped to Kafka Topics.

๐Ÿ‘‡
Read 12 tweets
Considering switching to a ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ role?

My thought in the ๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
Usually MLOps Engineers are professionals tasked with building out the ML Platform in the organization.

๐Ÿ‘‡
This means that the skill set required is very broad - naturally very few people start off with the full set of skills you would need to brand yourself as a MLOps Engineer. This is why I would not choose this role if you are just entering the market.

๐Ÿ‘‡
Read 10 tweets
What is the difference between Splittable and Non-Splittable Files?

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
You are very likely to run into a ๐——๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ฒ๐—ฑ ๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ ๐—ผ๐—ฟ ๐—™๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ in your career. It could be ๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ, ๐—›๐—ถ๐˜ƒ๐—ฒ, ๐—ฃ๐—ฟ๐—ฒ๐˜€๐˜๐—ผ or any other.

๐Ÿ‘‡
Also, it is very likely that these Frameworks would be reading data from a distributed storage. It could be ๐—›๐——๐—™๐—ฆ, ๐—ฆ๐Ÿฏ etc.

๐Ÿ‘‡
Read 12 tweets
So how do we implement ๐—ฃ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ฒ ๐—•๐—ฎ๐˜๐—ฐ๐—ต ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฃ๐—ถ๐—ฝ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ in ๐—ง๐—ต๐—ฒ ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ ๐—ช๐—ฎ๐˜†?

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
Letโ€™s zoom in:

๐Ÿญ: Everything starts in version control: Machine Learning Training Pipeline is defined in code, once merged to the main branch it is built and triggered.

๐Ÿ‘‡
๐Ÿฎ: Feature preprocessing stage: Features are retrieved from the Feature Store, validated and passed to the next stage. Any feature related metadata is saved to an Experiment Tracking System.

๐Ÿ‘‡
Read 13 tweets
How do we ๐——๐—ฒ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ฒ ๐—ฅ๐—ฒ๐—ฎ๐—น ๐—ง๐—ถ๐—บ๐—ฒ ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ ๐—Ÿ๐—ฎ๐˜๐—ฒ๐—ป๐—ฐ๐˜† and why should you care to understand the pieces as a ML Engineer?

Find out in the ๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience Image
Usually, what is cared about by the users of your Machine Learning Service is the total endpoint latency - the time difference between when a request is performed (1.) against the Service till when the response is received (6.).

๐Ÿ‘‡
Certain SLAs will be established on what the acceptable latency is and you will need to reach that. Being able to decompose the total latency is even more important as you can improve each piece independently. Let's see how.

๐Ÿ‘‡
Read 13 tweets
Do you know how ๐—”๐—ฝ๐—ฎ๐—ฐ๐—ต๐—ฒ ๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ ๐—ถ๐˜€ ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐—ฒ๐—ฑ?

Find out in the ๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience Image
๐—”๐—ฝ๐—ฎ๐—ฐ๐—ต๐—ฒ ๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ is an extremely popular distributed processing framework utilizing in-memory processing to speed up task execution. Most of its libraries are contained in the Spark Core layer.

๐Ÿ‘‡
As a warm up exercise for later deeper dives and tips, today we focus on some architecture basics.

๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ ๐—ต๐—ฎ๐˜€ ๐˜€๐—ฒ๐˜ƒ๐—ฒ๐—ฟ๐—ฎ๐—น ๐—ต๐—ถ๐—ด๐—ต ๐—น๐—ฒ๐˜ƒ๐—ฒ๐—น ๐—”๐—ฃ๐—œ๐˜€ ๐—ฏ๐˜‚๐—ถ๐—น๐˜ ๐—ผ๐—ป ๐˜๐—ผ๐—ฝ ๐—ผ๐—ณ ๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ ๐—–๐—ผ๐—ฟ๐—ฒ ๐˜๐—ผ ๐˜€๐˜‚๐—ฝ๐—ฝ๐—ผ๐—ฟ๐˜ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐˜‚๐˜€๐—ฒ ๐—ฐ๐—ฎ๐˜€๐—ฒ๐˜€:

๐Ÿ‘‡
Read 15 tweets
A refresher on the role of ๐——๐—ฎ๐˜๐—ฎ ๐—–๐—ผ๐—ป๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐˜€ in the Data Pipeline.

Read on in the ๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
In its simplest form Data Contract is an agreement between Data Producers and Data Consumers on what the Data being produced should look like, what SLAs it should meet and the semantics of it.

๐Ÿ‘‡
๐——๐—ฎ๐˜๐—ฎ ๐—–๐—ผ๐—ป๐˜๐—ฟ๐—ฎ๐—ฐ๐˜ ๐˜€๐—ต๐—ผ๐˜‚๐—น๐—ฑ ๐—ต๐—ผ๐—น๐—ฑ ๐˜๐—ต๐—ฒ ๐—ณ๐—ผ๐—น๐—น๐—ผ๐˜„๐—ถ๐—ป๐—ด ๐—ป๐—ผ๐—ป-๐—ฒ๐˜…๐—ต๐—ฎ๐˜‚๐˜€๐˜๐—ถ๐˜ƒ๐—ฒ ๐—น๐—ถ๐˜€๐˜ ๐—ผ๐—ณ ๐—บ๐—ฒ๐˜๐—ฎ๐—ฑ๐—ฎ๐˜๐—ฎ:

๐Ÿ‘‰ Schema of the Data being Produced.

๐Ÿ‘‡
Read 14 tweets
What does a ๐—ฅ๐—ฒ๐—ฎ๐—น ๐—ง๐—ถ๐—บ๐—ฒ ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—ผ๐—ฟ ๐—ฅ๐—ฒ๐—ฐ๐—ผ๐—บ๐—บ๐—ฒ๐—ป๐—ฑ๐—ฒ๐—ฟ ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ ๐——๐—ฒ๐˜€๐—ถ๐—ด๐—ป look like?

The graph was inspired by the amazing work of @eugeneyan

More in the ๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
Recommender and Search Systems are one of the biggest money makers for most companies when it comes to Machine Learning.

๐Ÿ‘‡
Both Systems are inherently similar. Their goal is to return a list of recommended items given a certain context - it could be a search query in the e-commerce website or a list of recommended songs given that you are currently listening to a certain song on Spotify.

๐Ÿ‘‡
Read 12 tweets
Here is a short refresher on ๐—”๐—–๐—œ๐—— ๐—ฃ๐—ฟ๐—ผ๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐—ฒ๐˜€ ๐—ผ๐—ณ ๐——๐—•๐— ๐—ฆ (๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฎ๐˜€๐—ฒ ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ).

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience Image
It could be that you are taking ACID Properties for granted when you are using transactional databases.

If you are interviewing for Data Engineering roles you will be asked to explain what the concept means.

๐Ÿ‘‡
Letโ€™s take a closer look.

Transaction is a sequence of steps performed on a database as a single logical unit of work.

The ACID database transaction model ensures that a performed transaction is always consistent by ensuring:

๐Ÿ‘‡
Read 8 tweets
Artificial Intelligence is the hottest technology in 2023. Most tech companies are making new investments in AI which has created new career opportunities not just in machine learning but in MLOps as well. This thread is on career opportunities in #MLOps. RT to spread the word.๐Ÿ‘‡
What is MLOps?

As companies generate and collect vast amounts of customer data, managing these large datasets and the numerous machine-learning models they create will get increasingly complex. MLOps is sometimes referred to as #AIOps as well.
MLOps is the systematic approach to managing the entire lifecycle of ML models and their deployment in a production environment. It combines principles and practices of software engineering and #DevOps to ensure efficient, reliable, and scalable management of ML models.
Read 8 tweets
Do you ever feel like your data science and IT teams are speaking different languages? That's where #MLOps comes in!

By standardizing workflows and processes, MLOps can bridge the gap between these two critical teams.

(A thread) ๐Ÿ‘‡๐Ÿงต
One of the key benefits of MLOps is that it enables data scientists and IT professionals to work together more efficiently and effectively. For example, MLOps practices can help to ensure that data scientists have access to the right infrastructure and tools.
Another important aspect of MLOps is that it enables data scientists to focus on what they do best โ€“ developing models and algorithms โ€“ while IT takes care of the operational aspects of model deployment and management.
Read 7 tweets
๐—ก๐—ผ ๐—˜๐˜…๐—ฐ๐˜‚๐˜€๐—ฒ๐˜€ ๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐—ฃ๐—ผ๐—ฟ๐˜๐—ณ๐—ผ๐—น๐—ถ๐—ผ ๐—ง๐—ฒ๐—บ๐—ฝ๐—น๐—ฎ๐˜๐—ฒ - next week I will enrich it with the missing Machine Learning and MLOps parts!

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
Today - letโ€™s review it once more. It is super helpful as these kind of Data Architectures are what you will find in real life situations.

๐—ฅ๐—ฒ๐—ฐ๐—ฎ๐—ฝ:

๐Ÿ‘‡
๐Ÿญ. Data Producers - Python Applications that extract data from chosen Data Sources and push it to Collector via REST or gRPC API calls.

๐Ÿ‘‡
Read 14 tweets
What are ๐—Ÿ๐—ฎ๐—บ๐—ฏ๐—ฑ๐—ฎ ๐—ฎ๐—ป๐—ฑ ๐—ž๐—ฎ๐—ฝ๐—ฝ๐—ฎ ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ๐˜€?

๐Ÿงต

#Data #DataEngineering #MLOps #MachineLearning #DataScience
Lambda and Kappa are both Data architectures proposed to solve movement of large amounts of data for reliable Online access.

๐Ÿ‘‡
The most popular architecture has been and continues to be Lambda. However, with Stream Processing becoming more accessible to organizations of every size you will be hearing a lot more of Kappa in the near future. Letโ€™s see how they are different.

๐Ÿ‘‡
Read 15 tweets
Letโ€™s remind ourselves of how a ๐—ฅ๐—ฒ๐—พ๐˜‚๐—ฒ๐˜€๐˜-๐—ฅ๐—ฒ๐˜€๐—ฝ๐—ผ๐—ป๐˜€๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐——๐—ฒ๐—ฝ๐—น๐—ผ๐˜†๐—บ๐—ฒ๐—ป๐˜ looks like - ๐—ง๐—ต๐—ฒ ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ ๐—ช๐—ฎ๐˜†.

๐Ÿงต

#MLOps #MachineLearning #DataScience #Data Image
You will find this type of model deployment to be the most popular when it comes to Online Machine Learning Systems.

Let's zoom in:

๐Ÿญ: Version Control: Machine Learning Training Pipeline is defined in code, once merged to the main branch it is built and triggered.

๐Ÿ‘‡
๐Ÿฎ: Feature Preprocessing: Features are retrieved from the Feature Store, validated and passed to the next stage. Any feature related metadata that is tightly coupled to the Model being trained is saved to the Experiment Tracking System.

๐Ÿ‘‡
Read 14 tweets
I have successfully compiled and run GLM-130b on a local machine! It's now running in `int4` quantization mode and answering my queries.

I'll explain the installation below; if you have any questions, feel free to ask!
github.com/THUDM/GLM-130B
130B parameters on 4x 3090s is impressive. GPT-3 for reference is 175B parameters, but it's possible that it's over capacity for the data & compute it was trained on...

I feel like a #mlops hacker having got this to work! (Though it should be much easier than it was.)
To get GLM to work, the hardest part was CMake from the FasterTransformer fork. I'm not a fan of CMake, I don't think anyone is.

I had to install cudnn libraries manually into my conda environment, then hack CMakeCache.txt to point to those...
Read 33 tweets
๐ŸŽ Ding Dong! Hereโ€™s a flash from Iterative Community this month๐Ÿ‘‡

๐Ÿฆฎ MLOps Guide
๐Ÿงช DVC Extension
๐ŸŒŒ A Fable about MLOps
๐Ÿ“ Cheatsheet for DVC
๐Ÿง‘โ€๐Ÿ’ป Data Query Language

@Iterativeai @DVCorg
#mlOps #data #community

๐Ÿงต[1/7]
๐Ÿฆฎ MLOps Guide

For their engineering final project at @Insper, Arthur Olga, Gabriel Monteiro, Guilherme Leite, and Vinicius Lima created the MLOps Guide, which provides a Complete MLOps development cycle using DVC, CML, and IBM Watson.

mlops-guide.github.io

๐Ÿงต[2/7]
๐Ÿงช DVC Extension

@erykml1 wrote a fabulous, in-depth tutorial on experiment tracking using our new DVC Extension for VS Code ๐Ÿ‘‡

towardsdatascience.com/turn-vs-code-iโ€ฆ

๐Ÿงต[3/7]
Read 7 tweets
If I could only choose 5 books to read in 2023 as an aspiring Data Engineer these would be them in a specific order:

Read on in the Thread ๐Ÿ‘‡

--------

Follow me and hit ๐Ÿ”” to ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น ๐—จ๐—ฝ in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space!
1๏ธโƒฃ โ€๐—™๐˜‚๐—ป๐—ฑ๐—ฎ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐—น๐˜€ ๐—ผ๐—ณ ๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ดโ€ - A book that I wish I had 5 years ago. After reading it you will understand the entire Data Engineering workflow. It will prepare you for further deep dives.

๐Ÿ‘‡
2๏ธโƒฃ โ€๐—”๐—ฐ๐—ฐ๐—ฒ๐—น๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒโ€ - Data Engineers should follow the same practices that Software Engineers do and more. After reading this book you will understand DevOps practices in and out.

๐Ÿ‘‡
Read 9 tweets
What is a ๐—™๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฆ๐˜๐—ผ๐—ฟ๐—ฒ and why is it such an important element in ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ?

Find out in the Thread ๐Ÿ‘‡

--------

๐—™๐—ผ๐—น๐—น๐—ผ๐˜„ ๐—บ๐—ฒ and hit ๐Ÿ”” to ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น ๐—จ๐—ฝ in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space! Image
Feature Store System sits between Data Engineering and Machine Learning Pipelines and it solves the following issues:

โžก๏ธ Eliminates Training/Serving skew by syncing Batch and Online Serving Storages (5)

๐Ÿ‘‡
โžก๏ธ Enables Feature Sharing and Discoverability through the Metadata Layer - you define the Feature Transformations once, enable discoverability through the Feature Catalog and then serve Feature Sets for training and inference purposes trough unified interface (4๏ธ,3).

๐Ÿ‘‡
Read 15 tweets
Do you know what CDC(Change Data Capture) is and that there are multiple ways to implement it?

Find out in the Thread ๐Ÿ‘‡

--------

๐—™๐—ผ๐—น๐—น๐—ผ๐˜„ ๐—บ๐—ฒ and hit ๐Ÿ”” to ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น ๐—จ๐—ฝ in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space! Image
๐—–๐—ต๐—ฎ๐—ป๐—ด๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—–๐—ฎ๐—ฝ๐˜๐˜‚๐—ฟ๐—ฒ is a software process used to replicate actions performed against Operational Databases for use in downstream applications.

๐—ง๐—ต๐—ฒ๐—ฟ๐—ฒ ๐—ฎ๐—ฟ๐—ฒ ๐˜€๐—ฒ๐˜ƒ๐—ฒ๐—ฟ๐—ฎ๐—น ๐˜‚๐˜€๐—ฒ ๐—ฐ๐—ฎ๐˜€๐—ฒ๐˜€ ๐—ณ๐—ผ๐—ฟ CDC. ๐—ง๐˜„๐—ผ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—บ๐—ฎ๐—ถ๐—ป ๐—ผ๐—ป๐—ฒ๐˜€:

๐Ÿ‘‡
โžก๏ธ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฎ๐˜€๐—ฒ ๐—ฅ๐—ฒ๐—ฝ๐—น๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป (refer to 3๏ธโƒฃ in the Diagram).

๐Ÿ‘‰ CDC can be used for moving transactions performed against Source Database to a Target DB. If each transaction is replicated - it is possible to retain all ACID guarantees when performing replication.

๐Ÿ‘‡
Read 15 tweets
What does good Model Tracking System look like?

Find out in the Thread ๐Ÿ‘‡

--------

๐—™๐—ผ๐—น๐—น๐—ผ๐˜„ ๐—บ๐—ฒ and hit ๐Ÿ”” to ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น ๐—จ๐—ฝ in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space! Image
It should be composed of two integrated parts: Experiment Tracking System and a Model Registry.

From where you track ML Pipeline metadata will depend on MLOps maturity in your company.

If you are at the beginning of the ML journey you might be:

๐Ÿ‘‡
1๏ธโƒฃ Training and Serving your Models from experimentation environment - you run ML Pipelines inside of your Notebook and do that manually at each retraining.

If you are beyond Notebooks you will be running ML Pipelines from CI/CD Pipelines and on Orchestrator triggers.

๐Ÿ‘‡
Read 14 tweets
Greensteam subscribed to the idea of doing #MLOps at a reasonable scale.

Seeing the quickly growing number of customers (= ML experiments), they decided to build their MLOps stack from 0 and solve all core problems around it.

Here are some of the issues โ†’ solutions:
- 1000s of Jupyter notebooks โ†’ git
- Managing dependencies and reproducibility โ†’ @Docker
- Dealing with unit tests (in some parts of the model code) that donโ€™t test โ†’ running smoke tests
- Different linter versions showing different results locally and in Jenkins โ†’ code checks moved into Docker
- Finding parts of the code that unit tests didnโ€™t cover โ†’ mypy
- Testing models for multiple datasets of different clients in different scenarios โ†’ @argoproj
Read 5 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!