Bojan Tunguz Profile picture
Mar 10 4 tweets 1 min read
OK, here is my honest take on when to use which approach/technique with a given dataset. These are my rules of thumb, and caveats could fill out the entire internet.
1. Up to a few hundred datapoint, use stats
2. For few hundred to few thousand use linear/logistic regression 1/
3. Between few thousand to about 10,000 it's anyone's guess. Gradient boosters generally do well here, with other "classical" algorithms.(SVM for instance) sometimes shining. 2/
4. Many thousands to about a billion datapoint is where Gradient Boosted trees rule. If you need just one algorithm, go with this. You'll never go wrong. 3/
5. If you have several billion datapoint, or many times that amount, check out neural nets. They have the capacity to absorb those kinds of datasets easily. 4/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bojan Tunguz

Bojan Tunguz Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tunguz

Mar 6
As anyone with an even cursory knowledge of AI history knows, there have been several AI Winters, periods of cooling of interest (and drop in funding) in AI research. All of these came about after the realization that at the time dominant AI paradigms were somehow limited. 1/7
For at least a decade now we have been enjoying an unprecedented AI Springtime. A perfect storm of major advances in algorithms (deep learning), computational architecture (GPUs) and availability of large high quality datasets has enabled the field to grow - exponentially! 2/7
However, in the real world there is no such thing as an endless exponential growth. What may seem like an exponential curve, inevitably turns out to be the fast rising part of a logistic curve. It is hard to speculate when the fastest part of the growth will end though. 3/7
Read 7 tweets
Feb 20
You gotta have options - that's the line that a jewelry salesman once used on my wife, and has become an inside joke in our family. However, that sales line is a very good consideration to have in all sorts of life situations.

1/

taylorpearson.me/optionality/
I endured some of the biggest setbacks in my life when I found myself in situations where I had just a few bad options, or even worse, just one terrible one. Over the years I found myself unconsciously working to maximize the number of options that I had. 2/
There are many ways that you can increase your optionality, and most of them don't require you to have access to outsize resources. 3/
Read 8 tweets
Feb 17
A few weeks ago I came across a tweet by a prominent ML/AI developer and researchers that promoted a new post about the use of transformers based neural networks for tabular data classification.

keras.io/examples/struc…

I took a look. Here is what I found. A 🧵 👇 1/27
The post was on Keras’ official site, and it seemed like a good opportunity to learn how to build transfomers with Keras, somethig that I’ve been meaning to do for a while. However, one part of the post and the tweet bothered me. 2/27
However, one part of the post and the tweet bothered me. It claimed that the model mateched “the performance of tree-based ensemble models.” As those who know me well know, I am pretty bullish on the tree-based ensemble models, 3/27
Read 27 tweets
Jan 20
The current issue of @Nature has three articles that show how to make those error-correcting mechanisms achieve over 99% accuracy, which would make silicon-based qubits a viable option for the large-scale quantum computational devices.

#computing #quantumcomputing 1/
Fast universal quantum gate above the fault-tolerance threshold in silicon:
nature.com/articles/s4158…

Precision tomography of a three-qubit donor quantum processor in silicon:
nature.com/articles/s4158…

2/
Quantum logic with spin qubits crossing the surface code threshold:
nature.com/articles/s4158… 3/
Read 8 tweets
Dec 14, 2021
I posted this back in January:

I've worked for 4 different tech companies in various Data Science roles. For my day job I have never ever had to deal with text, audio, video, or image data. 1/4
Based on the informal conversations I've had with other data scientists, this seems to be the case for the vast majority of them. 2/4
Almost a year later this remains largely true: for the *core job* related DS/ML work, I have still not used any of the aforementioned data. However, for work-related/affiliated *research* I have worked with lots of text data. 3/4
Read 4 tweets
Oct 16, 2021
1/ After a year of work, our paper on mRNA Degradation is finally out!

paper: arxiv.org/abs/2110.07531
code: github.com/eternagame/Kag…
2/ A year ago I was approached with a unique and exciting opportunity: I was asked to help out with setting a Kaggle Open Vaccine competition, where the goal would be to come up with a Machine Learning model for the stability of RNA molecules.
3/ This is of a pressing importance for the development of the mRNA vaccines. The task seemed a bit daunting, since I have had no prior experience with RNA or Biophysics, but wanted to help out any way I could.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(