๐ฆ DVC is designed to improve upon past solutions to make the life of ML teams easier. Hereโs how it differentiates from other related technologies:
DVC builds upon Git by the concept of data files โ large files that should not be stored in a Git repository, but need to be tracked and versioned.
It leverages Git's features to enable managing different versions of data, data pipelines, and experiments.
๐งต[2/5]
๐ฆ Git-LFS vs DVC
DVC does not require special servers like Git-LFS demands. Any cloud storage like S3, Google Cloud Storage, or even an SSH server can be used as a remote storage.
No additional databases, servers, or infrastructure are required.
๐งต[3/5]
๐ Git-annex vs DVC
Git-annex is a datafile-centric system whereas DVC focuses on providing a workflow for machine learning and reproducible experiments.
Moreover, DVC can use โreflinks or hardlinksโ instead of โsymlinksโ to improve performance.
For all the devs out there willing to contribute to DVC, here is a quick guide to contributing to iterative/dvc repo
๐ Open a new issue
๐ป Set up a dev environment
๐ด Fork iterative/dvc
๐งช Add tests and run them locally
โฌ๏ธ Submit a pull request @iterativeai
๐งต [1/7]
๐ Open a new issue
Open a new issue in the issue tracker, whether it be a bug report or a feature request. ๐๐ฝ github.com/iterative/dvc/โฆ
๐งต[2/7]
๐ด Fork iterative/dvc
Fork iterative/dvc and then clone it into your local computer to start contributing.
๐ฆ Here are some of the cool commands you can try out right now in the DVC command line interface!
๐ป dvc dag
๐ง dvc freeze
๐ฆ dvc move
๐ dvc metrics show
๐งน dvc gc
๐งต [1/7]
๐ป dvc dag
๐๐๐ ๐๐๐ is very helpful in quickly checking out the stages of a pipeline up to the target stage in a simple visual representation. If the target is omitted, it will show the full project DAG.
๐งต [2/7]
๐ง dvc freeze
๐๐๐ ๐๐๐๐๐ฃ๐ helps us to freeze stages until ๐๐๐ ๐๐๐๐๐๐๐ฃ๐ is used on them. Frozen stages are never executed by ๐๐๐ ๐๐๐๐๐.