Jesper Dr.amsch Profile picture
Sep 3, 2022 13 tweets 10 min read Read on X
Are you a scientist applying ML?

I wrote a tutorial with ready-to-use notebooks to make your life easier!

Let's focus on 3 aspects:
• More Citations
• Easier Review
• Better Collaboration

Let's see how:
First things first:

This was a @EuroSciPy tutorial in 2022.

In the future, a talk recording will be available. Until then here's the gist:

1. Model Evaluation
2. Benchmarking
3. Model Sharing
4. Testing
5. Interpretability
6. Ablation

#euroscipy
github.com/JesperDramsch/…
@EuroSciPy 📐 Model Evaluation

In science, we want to describe the world.

Overfitting gets in the way of this.

With real-world data, there are many ways to overfit, even if we use a random split and have a validation and test set!

Save yourself the pain!

github.com/JesperDramsch/…
@EuroSciPy A machine learning model that isn't evaluated correctly is not a scientific result.

This leads to desk rejections, tons of extra work, or in the worst case maybe redactions and being the "bad example".

Especially on:
• Time Data
• Spatial Data

More?
dramsch.net/books
@EuroSciPy 🔬 Benchmarking

Compare your models using the right metrics and benchmarks.

Here are great examples:
• DummyClassifiers
• Benchmark Datasets
• Domain Methods
• Linear Models
• Random Forests

Always ground your model in the reality of science!

github.com/JesperDramsch/…
@EuroSciPy Proper benchmarks make stronger papers!

Metrics on their own don't always paint a full picture.

Use benchmarks to tell a story of "how well your model should be doing" and disarm many many comments by Reviewer 2 before they're even written down.
@EuroSciPy 🤝 Model Sharing

Sharing models is great for reproducibility and collaboration.

Export your models and fix the random seed for paper submissions.

Share your dependencies in a requirements.txt or env.yml so other researchers can use & cite your work!

github.com/JesperDramsch/…
@EuroSciPy Good code is easy to use and cite!

Use these libraries:
• flake8 for linting
• black for formatting

Write docstrings for docs!
(@code has a fantastic extension called autoDocstring)

Provide a @Docker container for ultimate reproducibility.

Your peers will thank you.
@EuroSciPy @code @Docker ⚗️ Testing

I know code testing in science is hard.

Here are ways that make it incredibly easy:
• Doctests for small examples
• Data Tests for important samples
• Deterministic tests for methods

github.com/JesperDramsch/…
@EuroSciPy @code @Docker You can make your own life and that of collaborators 1000 times easier!

Use Input Validation.

Pandera is a nice little tool that lets you define how your input data should look like. Think:
• Data Ranges
• Data Types
• Category Names

It's honestly a game changer and easy!
@EuroSciPy @code @Docker 🧠 Interpretability

This is a great communication tool for papers and meetings with domain scientists!

No one cares about your mean squared error!

How does the prediction depend on changing your input values?!

What features are important?!

github.com/JesperDramsch/…
@EuroSciPy @code @Docker ✂️ Ablation Studies

You know it. I know it.

Data science is trying a lot and finding what works.
It's iterative!

Use ablation studies to switch off components in your solution to evaluate the effect on the final score!

This care is great in a paper!

github.com/JesperDramsch/…
@EuroSciPy @code @Docker The creation was supported by @SoftwareSaved.

You made it all this way down, you might be a great SSI fellow and get £3,000 for this stuff too!

Doubt? Read "would I even fit in?!":
software.ac.uk/blog/2022-08-0…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jesper Dr.amsch

Jesper Dr.amsch Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(