Vin Vashishta Profile picture
Apr 23 8 tweets 6 min read
If you're a Data Scientist who wants to be a better developer or builder, here's a thread on how to do it. There's so much bad advice out there, and I hope this helps clear things up.
1/8
#DataScience #MachineLearning #Programming
1. Spend a year coding as part of a team. Have people review your code and participate in code reviews. This will help you unlearn many bad habits. You'll also get exposure to different styles and best practices.
2/8
#DataScience #MachineLearning #Programming
2. Build traditional software engineering type projects. Services and Web Apps are great because you'll learn fundamental coding skills.

You'll have to Google a lot which is a software engineering superpower.
3/8
#DataScience #MachineLearning #Programming
When you have to go to external sources to learn, you figure out how to separate quality advice from amateur advice.

Quality sources give you background on why you build in a certain way, not just how to copy/paste code.
4/8
#DataScience #MachineLearning #Programming
3. Learn about coding best practices and dive into why we use them. Best practices and coding styles are there, so we write maintainable code.

Someone else will likely take over your project or refactor it for production.
5/8
#DataScience #MachineLearning #Programming
High-quality code makes that easier, so learning consistent style and best practices is critical.

4. (Optional) Join an open source project as a contributor. Find something you use and make contributions.
6/8
#DataScience #MachineLearning #Programming
This is like a final exam. You'll have to read other peoples' code and fix issues. Implementing new feature requests is the next step. It'll lead to even more rigorous code reviews.
7/8
#DataScience #MachineLearning #Programming
You'll also learn how to build from user needs and all the perils of partial requirements.

This isn't standard advice, but you'll grow your engineering capabilities this way. Other advice typically reinforces bad habits.
8/8
#DataScience #MachineLearning #Programming

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Vin Vashishta

Vin Vashishta Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @v_vashishta

Apr 22
Most Data Strategies are missing a critical component. It's a Data Monetization Catalog, and they are not difficult to build. Here's my process:
1/8
#DataScience #MachineLearning #Data #Strategy
The process starts with the question, what use cases is this data used for? Use cases have business value, and it's a straight-line connection.
2/8
#DataScience #MachineLearning #Data #Strategy
I walk clients through this exercise, and it reveals excellent insights because data catalogs and dictionaries are connected to technical use cases but rarely to business use cases.

Here's what we most frequently find:
3/8
#DataScience #MachineLearning #Data #Strategy
Read 8 tweets
Apr 21
Data Science introduces a new model or architecture weekly, and it can be tough to keep up. Here are some of the basics and recent releases with resources to help you quickly understand each one.
1/15
#DataScience #MachineLearning #DeepLearning
Let's start with DALL E2. Here's a python implementation. Sometimes the easiest way to learn about it is to use it.

github.com/lucidrains/DAL…

Here's a YT video with a simple explanation.


2/15
#DataScience #MachineLearning #DeepLearning
Google recently released an overview of PaLM. It's one of a growing list of large scale language models improving on the capabilities of earlier models like GPT-3. Deep learning is going big.

ai.googleblog.com/2022/04/pathwa…
3/15
#DataScience #MachineLearning #DeepLearning
Read 15 tweets
Apr 19
The Data Science learning path today is different than it was 3 years ago and looks nothing like it did 7 years ago. This thread has the main layers and example resources covering the basics, assuming you've got basic math covered.
1/18
#DataScience #MachineLearning
1. Research Methods. We do a lot of research and experimentation now. Data Scientists used to be model-centric but that's changed because our work must meet higher reliability requirements. I wrote an intro post: vinvashishta.substack.com/p/a-basic-intr…
2/18
#DataScience #MachineLearning
2. Causal Inference. Data Science has taken a hard turn towards causal inference, again to meet increasing model reliability requirements. An education on CI always starts with Pearl.
ftp.cs.ucla.edu/pub/stat_ser/r…
3/18
#DataScience #MachineLearning
Read 18 tweets
Apr 18
Companies need a technology turnaround right now and that's a huge opportunity for mid to senior Data Scientists and leaders.

Playing a key role in a turnaround is a career maker. I've been part of 4 and here are some lessons learned.
1/12
#DataScience #MachineLearning #Strategy
Companies only change when they've been through enough pain. That low point and the months immediately after it are where Data Scientists can put forward ideas that will gain traction. Don't push the elephant. Let it lead.
2/12
#DataScience #MachineLearning #Strategy
Companies usually plan without enough information to build a good plan. Data Scientists can help the business understand the possibilities created by our work. That knowledge is critical from the earliest stages.
3/12
#DataScience #MachineLearning #Strategy
Read 12 tweets
Mar 18
My clients don't remember what models I used. I haven't won a single award for a complex implementation.

All anyone remembers is the money I've made them. I am only relevant because I was one of the 1st to monetize machine learning.

1/15
#DataScience #MachineLearning #strategy
I've been fortunate to be a part of long term Machine Learning projects and follow model performance for 5+ years in a production environment.

That's taught me some lessons.

2/15
#DataScience #MachineLearning #strategy
The only way Data Science continues to move forward is with an equal focus on research and applications. To generate value, models must perform reliably in production over several years.

3/15
#DataScience #MachineLearning #strategy
Read 15 tweets
Mar 17
Job openings in Data Science:
139K Data Scientist
215K Data Engineer
179K Machine Learning Engineer

Here are some hard truths if you're hiring talent right now.

1/14
#DataScience #MachineLearning
For even mid-level Data Scientists, total compensation starts at $250K+. It goes up to $400K. ML Engineers can cost even more depending on their breadth of platform/architecture knowledge and ability to deploy models at scale.

2/14
#DataScience #MachineLearning
Researchers with a track record of delivering projects with business value are $350K+ total comp and those can go up much higher.

Big tech companies are driving compensation. Every business is now competing with them for talent.

3/14
#DataScience #MachineLearning
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(