Love the "data science maturity levels" in @Patterns_CP

Interesting way to contextualize research at a glance (reminds me a bit of @justsaysinmice)

Full list in thread:
1) Concept

Basic principles of a new data science output observed and reported (e.g., statement of principles, dataset, new algorithm, new theoretical concept, theoretical system infrastructure)
2) Proof-of-concept

Data science output has been formulated, implemented, and tested for one domain/problem (e.g., dataset with rich domain-specific metadata, algorithm coded up as software, principles with expanded guidance on how to implement them)
3) Development/pre-production

Data science output has been rolled out/validated across multiple domains/problems
4) Production

Data science output is validated, understood, and regularly used for multiple domains/problems (e.g., operational data-sharing service across institutes/countries, ML algorithm to tag images, shared data infrastructure to manage access to compute/archive resources)
5) Mainstream

Data science output is well understood and (nearly) universally adopted (e.g., the iInternet, citation of articles using DOIs)
More info about the levels + rationale here!
cell.com/patterns/dsml

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alex Tamkin

Alex Tamkin Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AlexTamkin

8 Dec
DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

SSL is a promising technology, but current methods are field-specific. Can we find general algorithms that can be applied to any domain?

🌐: dabs.stanford.edu
📄: arxiv.org/abs/2111.12062

🧵👇 #NeurIPS2021

1/
Self-supervised learning (SSL) algorithms can drastically reduce the need for labeling by pretraining on unlabeled data

But designing SSL methods is hard and can require lots of domain-specific intuition and trial and error

2/
We designed DABS to drive progress in domain-agnostic SSL

Our benchmark addresses three core modeling components in SSL algorithms:

(1) architectures
(2) pretraining objectives
(3) transfer methods

3/
Read 13 tweets
11 Jan
Some takeaways from @openai's impressive recent progress, including GPT-3, CLIP, and DALL·E:

[THREAD]

👇1/
1) The raw power of dataset design.

These models aren't radically new in their architecture or training algorithm

Instead, their impressive quality is largely due to careful training at scale of existing models on large, diverse datasets that OpenAI designed and collected.

2/
Why does diverse data matter? Robustness.

Can't generalize out-of-domain? You might be able to make most things in-domain by training on the internet

But this power comes w/ a price: the internet has some extremely dark corners (and these datasets have been kept private)

3/
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(