If your data contains all the features and only the features impacting the system/behaviors you’re trying to model
AND
Represents the full problem space, then the function your model learns will perform well in production. 3/7 #DataScience#MachineLearning#DeepLearning
That never happens. Bad data is more than missing values and terrible formatting. There are missing features and features you don’t need. Large parts of your problem space are missing/underrepresented. 4/7 #DataScience#MachineLearning#DeepLearning
You can have a perfect representation of production data that is also a terrible representation of the function you’re trying to learn. Your model testing metrics will look amazing, and your model will fail in production. 5/7 #DataScience#MachineLearning#DeepLearning
Validation happens through experimentation. The 1st rounds of experimentation reject most models, even those with great accuracy metrics.
The iterative experimental process is where reliable models come from. Experiments tell us how our models will perform in production and what to do to improve performance.
Take existing data, train models, and compare them (multiple single models and ensembles) to the EXISTING customer retention process’s performance. This is your first experiment! Why haven’t we used a test dataset?
3/12 #DataScience#MachineLearning#DeepLearning
A competent Data Scientist wants focused, intelligent, high level interview questions. We are comfortable with interpreting, synthesizing, and providing our perspective. It’s a core capability. 2/7 #DataScience#MachineLearning#jobinterview
In an interview, that comes through in our answers to well thought out questions. We are looking for an opportunity to showcase our experience. We want to talk about what we’ve built and the results we’ve produced. 3/7 #DataScience#MachineLearning#jobinterview
Great decisions start with great questions. Do you want a simple framework to focus your thoughts and come up with the right questions? 1/7
Level 1: Information Gathering. Ask broad, open questions about the events. I need enough of the facts before starting to formulate high-quality questions. 2/7
Level 2: Impacts. Ask questions about the immediate and longer-term implications. I need to know everyone’s perspective on the event(s). I am asking questions like: 3/7
Why do businesses need Data and AI Strategies? Data and models are novel asset classes. Every new technology has unique advantages over what came before them.
Cloud, data, analytics, models, etc., are monetized differently.
Each needs a customized strategy that leverages the technology’s unique strengths and informs decision-making about the technology across the business. 2/3 #DataScience#MachineLearning#ArtificialIntelligence
One vision must align the business’s application of all technologies to create and deliver value to customers in new ways.
Sundar Pichai said, “Reward effort, not outcomes.” He is driving an innovative culture, and those initiatives don’t always pan out. The business must be willing to take risks that do not result in revenue. 1/7 #DataScience#Innovation
Focusing on outcomes makes the business too conservative, and innovation will fail to thrive. Data science requires research to innovate, so our field needs that mindset to succeed. 2/7 #DataScience#Innovation
Data science innovation requires high-risk tolerance and a reward structure that backs up the business’s culture. 3/7 #DataScience#Innovation
First, never give your salary expectation to a recruiter unless you're working with them to find a role amongst several options. Why?
Companies will go higher for an exceptional candidate and the recruiter will screen you out of the process.
2/11 #DataScience#Career#Salary
Start VERY high and follow up with, "And where does that fall in this role's salary band?" If you're making $150K now, ask, "I am looking for $250K. Where does that fall in this role's salary band?"