Post

@PyTorch

@szagoruyko5

@PyTorch

https://twitter.com/ari_seff/status/1289284199157923840

https://twitter.com/cHHillee/status/1401622477600563204

More from @_ScottCondron

Scott Condron

@_ScottCondron

Jun 29, 2023

https://twitter.com/eugeneyan/status/1652463942822957056

Language UIs are dead, bring back clicks

Jokes aside, I've gone down a bit of a rabbit hole trying to build a tag-based document explorer with <200 lines of python

1. keyword extraction
2. LLM turns keywords into tags
3. tag docs with LLM
4. document browsing on a weave board https://t.co/bGW1XVN3iu

https://twitter.com/eugeneyan/status/1652463942822957056

mini tutorial to browse research papers (I'll put the code in the weave repo soon)

- Grab arXiv papers
- Extract keywords using yake, it's free & easy to use to seed our tags

Yakes keywords can be a bit funny and are not as useful for tagging/categorizing content

So we ask a LLM to create categories using these keywords

Read 6 tweets

Scott Condron

@_ScottCondron

Jul 15, 2022

@huggingface

Hyperparameter Search for @huggingface transformers models 🐝🤗

For the @weights_biases blogathon, @matteopilotto24 created this blog post showcasing how to run hyperparameter sweeps on HF transformers models using W&B Sweeps.

Blog post: wandb.me/hf-sweeps

1/7

The plot above shows how each of your experiments performed on the task.

It shows the relationship of the different hyperparameters to the metric you care about.

W&B Sweeps automatically generates this plot as well as this parameter importance plot below:
2/7

First, he adds the W&B HF integration by just logging into his account and setting some environment variables to tell HF to use W&B to track his experiments

3/7

Read 7 tweets

Scott Condron

@_ScottCondron

Nov 21, 2021

Challenges in building Software 2.0

What are the different challenges with running and productionizing machine learning projects compared to traditional software development?

Firstly, technical debt:

1/7

Technical debt is a way of framing the cost of taking shortcuts with your software development.

It’s quicker in the short term but you’ll have to pay off that tech debt in the future to make your code more robust, tested, documented, etc.

ML has lots of “hidden” tech debt
2/7

Bad code is only a small part of the possible tech debt within ML systems.

The addition of data and hyper-parameters as dependencies within the system causes a bunch of subtle, hard to debug issues.

This paper talks about these issues in more detail:
papers.nips.cc/paper/2015/has…
3/7

Read 7 tweets

Scott Condron

@_ScottCondron

Nov 9, 2021

@charles_irl

Learn Math for Machine Learning with me 🙋‍♂️

The Math4ML series by @charles_irl on the @weights_biases YouTube channel is for those who want to understand the math behind training models.

I stumble through his lessons exercises on camera so we all can be guided by him when needed

https://twitter.com/charles_irl/status/1457840021772259332

@charles_irl

Before each of these livecoding sessions, I was tasked with watching the associated Math4ML lesson. I then joined @charles_irl and worked through the autograded exercise notebooks with him.

2/4

https://twitter.com/charles_irl/status/1457840023944921092

@charles_irl

This was a humbling experience and a lot of fun. Charles has clearly crafted these lessons with a lot of love over many years so it was a real joy to help in a small way. He’s also a great teacher 👨🏼‍🏫 (sorry @charles_irl, there’s no green-haired teacher emoji).

3/4

https://twitter.com/charles_irl/status/1457840027220602881

Read 4 tweets

Scott Condron

@_ScottCondron

Oct 27, 2021

@chipro

What is data lineage and why is it important when building ML systems?

From @chipro’s new book, Designing Machine Learning Systems:
1/5

Data lineage is the process of keeping track of the origin of your data and tracking versions of it over time.

This is important if your data changes and you want to keep track of which model was trained using which data and how the model performance is affected.
2/5

@weights_biases

You could track data versions yourself but it'll likely be as error prone as "model_latest_latest_actual_latest_2021.pth" is when tracking models.

@weights_biases Artifacts is one way you can track the data you used to train your models with a few lines of code.
3/5

Read 5 tweets

Scott Condron

@_ScottCondron

Aug 19, 2021

New blog post!🕺🥳

This is about a recent paper I enjoyed called "Omnimatte: How to Detect Objects and Their Effects."

I love writing these kind of summary blog posts because I get to learn about such cool papers.

wandb.ai/_scott/omnimat…
1/5

The model in this paper learns to associate one or more objects to the effects they have on their environment (shadows, reflections, etc.) for a given video and rough segmentation masks of each object. This enables video effects like "background replacement".
2/5