We cover diffusion models, the role of the "prior," and U-Nets for image generation.
Open-source datasets and models have kicked off a true explosion of activity in image/video generation. We're excited to see what related projects come out from our synchronous FSDL cohort!
The world of AI has never been more exciting than right now. It feels like we've crested a hill and can see a beautiful new landscape all around. There's a ton to build, and we're excited to help you do it!
Try out @LabelStudioHQ and see how the tasty Tensor sausage gets made out of data chuck with our latest lab notebook and video!
However much you care about data, you should probably care more.
High-quality data is still a major differentiator for ML app quality.
And good understanding of the data is a major differentiator for ML engineer quality!
Throughout the labs, we've been building up a neural network capable of basic OCR. We've focused on model architectures, on training frameworks, on experiment management, and on software engineering infra.
• Speed and bandwidth of disks varies a lot, so use NVMe SSDs
• Store binary data in standard formats like JPGs
• Store metadata and text as JSON or Parquet
• Databases are the best tool for deep work with structured data
• When it comes to data warehouses/lakes, there's a bunch of jargon that could be helpful to know
• The basic takeaway is that data lakes are great if you need to aggregate different data sources at scale
• @SnowflakeDB and @databricks are the leading solutions
FSDL Lecture 3: Troubleshooting & Testing is now live!
We cover:
• how to design software tests
• recommended tooling for testing and code quality assurance
• how to test ML systems, the easy and the hard way
• how to debug neural networks
As always, our recommendations are specific and actionable. We recommend testing docstring code with doctests and quick-and-dirty notebook testing with nbformat.
We share a perspective on testing from @nelhage: test suites are like classifiers, classifying code updates as "acceptable" or "unacceptable".
In ML, we design classifiers to trade off precision and recall.
In the latest FSDL lab notebook and video, we walk through why experiment management is so important for building awesome ML-powered products and how you can do it with @weights_biases.
🔗:
In the first half of the course, we survey how to train models, ingest and store data, and put models in production.
Or, as @sergeykarayev puts it in lecture 2: "Development, Data, Deployment".
We're covering development now.
Because applied deep learning is still fairly new, model development is a messy iterative process that's more akin to experimentation than engineering.
FSDL Lecture 2: Development Infrastructure & Tooling is now live!
We cover what you need to know about:
• software engineering
• deep learning frameworks
• distributed training
• GPUs (cloud and on-prem)
• experiment management
We aim to be specific and make actionable recommendations.
For example, we think the debate between modules and notebooks is that you should write code in modules, and import them into notebooks 😃
We think PyTorch is the clear pragmatic choice in 2022, but that Tensorflow is also great, and Jax could be an excellent choice if you're going off the beaten path.