It's fun to speculate about what direction DS as a profession is going, but it's also instructive to dig into how it grew into its modern form. This paper's an example of that, a snapshot from a few years before the term itself was coined circa 2008 projecteuclid.org/download/pdf_1…
The author Leo Breiman (who you may know from such greatest hits as bagging and random forests) talks about going from stats academia to working in industry as a statistical consultant. When he eventually returned to academia, he experienced a sort of reverse culture shock
He frames academia and industry as different cultures of statistical modeling. Both share the goal understanding the relationship of some input variables X an output variable Y. The true relationship is unknown, a black box, but statisticians in both camps aim to approximate it
The paper describes data modeling (academic stats) culture as assuming that the black box is fundamentally orderly, stochastic and parametric. The other culture (algorithmic modeling) isn't terribly interested in what's in the black box, focusing instead on predictive accuracy
After spending time in industry, Breiman finds himself more sympathetic to algorithmic modeling culture, and he gets into plenty of formal and technical reasons why. I'm not gonna get into them in a tweet thread, but I recommend checking out the paper if you're interested in them
Some of his reflections from working in industry resonate strongly with me:

* Focus on finding a good solution—that’s what consultants get paid for
* Live with the data before you plunge into modeling
Some kind of caught me off guard to hear a statistician in industry say:

* Search for a model that gives a good solution, either algorithmic or data
* Predictive accuracy on test sets is the criterion for how good the model is
The first of those two is a little less surprising for me, given that "a good solution" could mean a lot of different things, but the second is both makes a lot of sense to me and is weird to think about.
That's probably because of the type of DS I am. I've done some forecasting and productionized precious few ML models, but I've mostly modeled to do what Breiman describes as "extracting information about how nature is associating response variables to input variables"
I perceiving caring about accuracy first as ML eng territory, and I perceive MLE as stemming directly from computer science. This is sort of silly of me, given that "data scientist" has been an overloaded junk title for most of the time I've held it (and if I'm honest still is)
Specialization within the DS world is still emerging and while some of the boundaries between types of DS roles have gotten sharper in the last few years, they're all still coming from the same lineage
"Terrabytes of data are pouring into computers from many sources, both scientific, and commercial, and there is a need to analyze and understand the data," Breiman says, like a prophet foretelling HBR articles to come
And reflecting on his work in the 90's (!!!), he was already seeing how crossfunctional this line of work can be: "there has been a noticeable move toward statistical work on real world problems and reaching out by statisticians toward collaborative work with other disciplines"
The problems we're solving with data today are greater in scale and complexity, which means we're getting the luxury of focusing on narrower subsets of those problems. Now we ask for analysts, scientists, MLEs, analytics engineers, etc. instead of just DSes or statisticians

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Katie Bauer

Katie Bauer Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @imightbemary

13 Feb
The surest sign that a company has no idea how to work with Data Science is requesting Insights™ as its primary output. You can tell just from the word--it's vague and sort of mystical, which is not exactly how you want to describe your quantitative teams
Yes, data analysis is about convincing someone (perhaps yourself) that something is or isn't true, but it takes a special kind of talent to do it consistently. It's easier to remember your analyses that changed someone's mind because it's easier to remember unusual events
That's definitely been the case for me, at least. I've done a lot of analysis over the course of my career, and I can only think of a handful that meaningfully changed the conversation around a subject
Read 9 tweets
28 Jul 20
I remember the first time I made an important decision at work. It was defining a metric a sales team would be using, and I didn't even realize I'd done it until it had already happened.
It was bizarre. I'd always thought people should be listening to what I thought and taking my advice, but this felt very abrupt. They were just going to take my word for it? Wasn't someone going to check my work? What if I was wrong?
This was the first time that I felt like there might be real consequences to me being wrong, and it intimidated me. The fact that people WOULD just take my word for it meant I had responsibility to not say things lightly.
Read 5 tweets
5 Jul 20
What is the difference between an engineering manager and a data science manager? It's a question I find myself ruminating over almost constantly. There's tons of good thinking and writing about eng management out there, but I don't find that it always translates to the DS world.
Granted, "Data Science" is still a broad cover term, so depending on your flavor of data science, the leap is shorter. One way of segmenting the DS world that relate to is the type A vs. type B DS, where A is for "analysis" and B is for "building"). medium.com/@rchang/my-two…
My suspicion is that most eng management advice can apply to Type B DS management pretty readily. Their work is more purely engineering. But what does that mean for Type A DS management? What makes it different and thus hard to apply the same advice to? I have some thoughts.
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(