Discover and read the best of Twitter Threads about #dataops

Most recents (5)

Testing your pipelines before merging is crucial to ensure they do not fail in production. However, testing data pipelines is complex (and expensive) due to the data size, confidentiality, and time it takes to test a data pipeline.
🧵
#data #dataengineering #testing #dataops
Here are a few ways to get data for your tests:

1. Copying data: An exact copy of the prod data for testing will ensure that our changes are not breaking the pipeline. This is expensive! You can use a part of data for testing, accepting possible edge case misses.
2. Data git: Projects like Nessie and LakeFS can help set up different environments without replicating entire data.
Read 7 tweets
Live from #GartnerDA | 5 Ways to Enhance Your Data Engineering Practices with Robert Thanaraj, Gartner Director Analyst: gtnr.it/3JOkYPF
About this session: Analytics relies on a successful data foundation; it must be backed with the right data and processes. Robert explores 5 ways to enhance your #DataEngineering practices: gtnr.it/3JOkYPF #GartnerDA
#DataEngineering is a critical skill in high demand amongst employers with a 7.5% increase in demand. #GartnerDA
Read 13 tweets
If you have worked in the data space, you would have heard the term Metadata. It is used as a catch-all term. Here are a few things to think about when someone mentions Metadata 👇

#data #dataengineering #metadata #dataops
1. Orchestration: Time of run, re-run information, pipeline structure, the execution time for the pipeline, pipeline failure times, etc

2. Data processing: Input parameters, failure stack trace, number of rows processed, number of rows in output, number of discarded rows, etc
3. Data quality: Mean/sum/avg, etc. for numerical columns, available values for enum columns, etc. (think dataframe.describe in pandas)
Read 6 tweets
My talk “DataOps/MLOps with @DVCorg” at Data Fest 2020

#DVC #DataOps #MLOps

You can find other talks (even about DVC and CML) at ml-repa.ru/datafest2020
I did a re-recording because I had seen an error at the last minute and for some reason OBS Studio didn't change the slides in certain moments and overwrote the next slides here and there. Sorry for the issue. It's better to follow with the slides here: drive.google.com/file/d/11Isqza…
Read 3 tweets
#LinkedData is crucial to creating a #KnowledgeGraph that manifests as #SemanticWeb, period.

The #Web in all of this is about a #Hyperlink-based Entity Relationship Graph that's navigable by both humans and machines using de-reference (i.e., lookups).
@OpenLink has opened up a new community forum channel focused on #LinkedData with the sole aim of teaching anyone that's interested about this powerful concept via live examples.

See: community.openlinksw.com/c/linkeddata

#GraphDatabase #KnowledgeGraph #SemanticWeb
#CDO #DataOps
Actually, here is a very basic depiction of an Entity Relationship Graph constructed in line with #LinkedData principles.

#GraphDatabase #KnowledgeGraph #CDO #DataOps
Read 3 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!