If you're a Data Scientist who wants to be a better developer or builder, here's a thread on how to do it. There's so much bad advice out there, and I hope this helps clear things up. 1/8 #DataScience#MachineLearning#Programming
1. Spend a year coding as part of a team. Have people review your code and participate in code reviews. This will help you unlearn many bad habits. You'll also get exposure to different styles and best practices. 2/8 #DataScience#MachineLearning#Programming
2. Build traditional software engineering type projects. Services and Web Apps are great because you'll learn fundamental coding skills.
This is like a final exam. You'll have to read other peoples' code and fix issues. Implementing new feature requests is the next step. It'll lead to even more rigorous code reviews. 7/8 #DataScience#MachineLearning#Programming
You'll also learn how to build from user needs and all the perils of partial requirements.
This isn't standard advice, but you'll grow your engineering capabilities this way. Other advice typically reinforces bad habits. 8/8 #DataScience#MachineLearning#Programming
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Most Data Strategies are missing a critical component. It's a Data Monetization Catalog, and they are not difficult to build. Here's my process: 1/8 #DataScience#MachineLearning#Data#Strategy
The process starts with the question, what use cases is this data used for? Use cases have business value, and it's a straight-line connection. 2/8 #DataScience#MachineLearning#Data#Strategy
I walk clients through this exercise, and it reveals excellent insights because data catalogs and dictionaries are connected to technical use cases but rarely to business use cases.
Data Science introduces a new model or architecture weekly, and it can be tough to keep up. Here are some of the basics and recent releases with resources to help you quickly understand each one.
1/15 #DataScience#MachineLearning#DeepLearning
Let's start with DALL E2. Here's a python implementation. Sometimes the easiest way to learn about it is to use it.
Google recently released an overview of PaLM. It's one of a growing list of large scale language models improving on the capabilities of earlier models like GPT-3. Deep learning is going big.
The Data Science learning path today is different than it was 3 years ago and looks nothing like it did 7 years ago. This thread has the main layers and example resources covering the basics, assuming you've got basic math covered.
1/18 #DataScience#MachineLearning
1. Research Methods. We do a lot of research and experimentation now. Data Scientists used to be model-centric but that's changed because our work must meet higher reliability requirements. I wrote an intro post: vinvashishta.substack.com/p/a-basic-intr…
2/18 #DataScience#MachineLearning
2. Causal Inference. Data Science has taken a hard turn towards causal inference, again to meet increasing model reliability requirements. An education on CI always starts with Pearl. ftp.cs.ucla.edu/pub/stat_ser/r…
3/18 #DataScience#MachineLearning
Companies need a technology turnaround right now and that's a huge opportunity for mid to senior Data Scientists and leaders.
Playing a key role in a turnaround is a career maker. I've been part of 4 and here are some lessons learned.
1/12 #DataScience#MachineLearning#Strategy
Companies only change when they've been through enough pain. That low point and the months immediately after it are where Data Scientists can put forward ideas that will gain traction. Don't push the elephant. Let it lead.
2/12 #DataScience#MachineLearning#Strategy
Companies usually plan without enough information to build a good plan. Data Scientists can help the business understand the possibilities created by our work. That knowledge is critical from the earliest stages.
3/12 #DataScience#MachineLearning#Strategy
The only way Data Science continues to move forward is with an equal focus on research and applications. To generate value, models must perform reliably in production over several years.
For even mid-level Data Scientists, total compensation starts at $250K+. It goes up to $400K. ML Engineers can cost even more depending on their breadth of platform/architecture knowledge and ability to deploy models at scale.