DVC builds upon Git by the concept of data files – large files that should not be stored in a Git repository, but need to be tracked and versioned.
It leverages Git's features to enable managing different versions of data, data pipelines, and experiments.
🧵[2/5]
📦 Git-LFS vs DVC
DVC does not require special servers like Git-LFS demands. Any cloud storage like S3, Google Cloud Storage, or even an SSH server can be used as a remote storage.
No additional databases, servers, or infrastructure are required.
For all the devs out there willing to contribute to DVC, here is a quick guide to contributing to iterative/dvc repo
🐞 Open a new issue
💻 Set up a dev environment
🍴 Fork iterative/dvc
🧪 Add tests and run them locally
⬆️ Submit a pull request @iterativeai
🧵 [1/7]
🐞 Open a new issue
Open a new issue in the issue tracker, whether it be a bug report or a feature request. 👇🏽 github.com/iterative/dvc/…
🧵[2/7]
🍴 Fork iterative/dvc
Fork iterative/dvc and then clone it into your local computer to start contributing.
🦉 Here are some of the cool commands you can try out right now in the DVC command line interface!
💻 dvc dag
🧊 dvc freeze
📦 dvc move
📊 dvc metrics show
🧹 dvc gc
🧵 [1/7]
💻 dvc dag
𝚍𝚟𝚌 𝚍𝚊𝚐 is very helpful in quickly checking out the stages of a pipeline up to the target stage in a simple visual representation. If the target is omitted, it will show the full project DAG.
🧵 [2/7]
🧊 dvc freeze
𝚍𝚟𝚌 𝚏𝚛𝚎𝚎𝚣𝚎 helps us to freeze stages until 𝚍𝚟𝚌 𝚞𝚗𝚏𝚛𝚎𝚎𝚣𝚎 is used on them. Frozen stages are never executed by 𝚍𝚟𝚌 𝚛𝚎𝚙𝚛𝚘.