Zoom chat is unthreaded and hard to react to. Asking questions live is chaos. Instead, students post questions in a slack channel.
Instructors can answer them directly in slack, or summarize and answer aloud at a break in the lecture.
3/ @loom for lecture recording. Super simple, no uploads needed, and puts a nice face bubble in the corner.
4/ @GoogleColab for environment management. It's amazing how much work it saves to have developer environments out of the box, and the GPU support is essential for a deep learning class.
5/ @gradescope for assignment grading. Students can view and submit their work remotely, and it's way easier for us to grade via gradescope than, e.g., by pdf. We can even automate big chunks of it.
6/ What does your stack look like? Anything else we should take a look at?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Let's talk about setting up our Python/CUDA environment!
Our goals:
- Easily specify exact Python and CUDA versions
- Humans should not be responsible for finding mutually-compatible package versions
- Production and dev requirements should be separate
1/N
Here's a good way to achieve these goals:
- Use `conda` to install Python/CUDA as specified in `environment.yml`
- Use `pip-tools` to lock in mutually compatbile versions from `requirements/prod.in` and `requirements/dev.in`
dagster describes themselves as a "data orchestrator for machine learning, analytics, and ETL"
Let's break that down 👇
2/ When you work with real-world data, your pipelines can get complex.
E.g., to train a language model on twitter, you might:
- Download data
- Strip out offensive tweets
- Preprocess the data
- Fit models
- Summarize training performance
- Deploy the best model to production
3/ In production settings, pipelines can be even more complicated.
All well and good, but doing those steps manually every time you update your model is painful, resource intensive, and hard to scale.
And what happens if you have hundreds of these pipelines you need to manage?