Working on a real-world data science project feels quite different than tinkering on my hobby projects before.
7 things that are new to me on my first ML job 👇
1) Git. Working together with other developers is so much easier when you're confident in Git. I'm still struggling sometimes, but it get's easier over time 😃
2) DVC or data version control 🗃️. It's basically Git for larger files (training data, model artifacts, ...)
It makes your work reproducible by storing those files on a cloud storage device. Together with the Git repo, your project is completely tracked.
3) Clean Python code 🐍. I'm paying much more attention now to things like...
• Type Annotations
• Variable and function names (no more xs and ys...)
• Docstrings
• PEP8 conventions
• Tests
4) Deadlines ⏰. On Kaggle, you can spend days tweaking a neural net to gain another 0.1% of performance.
In a company, there are usually dozens of things to do, and you have to learn to get something done quickly and leave stuff imperfect.
5) Zoom meetings 📽️. You'll probably laugh at me for this one, but I wasn't used to do (business) video calls at all.
I find them to be stressful at times, with the slight time delay and the missing subtle reactions of your audience. I slowly get used to them.
6) Virtual Machines ☁️. Whether you use GCloud, Azure or AWS, you will have to get comfortable with SSH keys (and the arbitrary terminology each provider uses 🙄)
7) An IDE 👨💻. While I'm still fond of Jupyter notebooks and still a fan of #nbdev, working on an existing code base is best done using an IDE. I opted for @code and so far I'm very happy with the choice, it often makes life easier (and has built-in support for notebooks!)
I could continue this list for a while... But I'd also like to emphasize that you don't have to know each and every thing before you apply to a job!
You can learn many things on the go, and probably more effective so than learning everything on your own 🙂
• • •
Missing some Tweet in this thread? You can try to
force a refresh