Tweet

Bojan Tunguz

Follow @tunguz

14 Dec, 4 tweets, 1 min read

I posted this back in January:

I've worked for 4 different tech companies in various Data Science roles. For my day job I have never ever had to deal with text, audio, video, or image data. 1/4

Based on the informal conversations I've had with other data scientists, this seems to be the case for the vast majority of them. 2/4

Almost a year later this remains largely true: for the *core job* related DS/ML work, I have still not used any of the aforementioned data. However, for work-related/affiliated *research* I have worked with lots of text data. 3/4

Text is slowly gaining more prominence in *conjunction* with relational data in the day-job of many Data Scientists. However, tabular data remains the cornerstone of what Data Science is still about in most work environments. 4/4

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @tunguz

Bojan Tunguz

@tunguz

16 Oct

1/ After a year of work, our paper on mRNA Degradation is finally out!

paper: arxiv.org/abs/2110.07531
code: github.com/eternagame/Kag…

2/ A year ago I was approached with a unique and exciting opportunity: I was asked to help out with setting a Kaggle Open Vaccine competition, where the goal would be to come up with a Machine Learning model for the stability of RNA molecules.

3/ This is of a pressing importance for the development of the mRNA vaccines. The task seemed a bit daunting, since I have had no prior experience with RNA or Biophysics, but wanted to help out any way I could.

Read 8 tweets

Bojan Tunguz

@tunguz

18 Dec 20

One of the unfortunate consequences of Kaggle's inability to host tabular data competitions any more will be that the fine art of feature engineering will slowly fade away. Feature engineering is rarely, if ever, covered in ML courses and textbooks. 1/

There is very little formal research on it, especially on how to come up with domain-specific nontrivial features. These features are often far more important for all aspects of the modeling pipeline than improved algorithms. 2/

I certainly would have never realized any of this were it not for tabular Kaggle competitions. There, over many years, a community treasure trove of incredible tricks and insights had accumulated. Most of them unique. 3/

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Bojan Tunguz

Try unrolling a thread yourself!

More from @tunguz

Bojan Tunguz

Bojan Tunguz

Did Thread Reader help you today?

Like this author's thread?