@mihagazvoda @abhi1thakur @rushworth_a I think he's more talking about labeling a dataset for a proper use-case. The drawdata tool is more meant for educational purposes.

That said, I have done a lot of work on a suite of tools that totally help me label in my day-to-day work.

A thread!
@mihagazvoda @abhi1thakur @rushworth_a For starters, if you'd like to label ... possibly the lowest barrier to entry method is to check out pigeon for classification tasks. I didn't make the tool, but it's a nice jupyter widget that's quick to get started with.

calmcode.io/pigeon/introdu…
@mihagazvoda @abhi1thakur @rushworth_a But you typically need more. For example; are you 100% sure that your labels are correct? Checkout labelerrors.com, bad labels are totally a common thing.

To aid here, I've made a small library with many tricks to find bad examples.

github.com/koaning/doubtl…
@mihagazvoda @abhi1thakur @rushworth_a Other times you may have a huge unlabelled dataset. Maybe there's a few examples in that huge dataset that you'd to get more samples of. In that case ... simsity might help!

Tutorial here:
koaning.github.io/simsity/quicks…

Github here:
github.com/koaning/simsity
@mihagazvoda @abhi1thakur @rushworth_a Finally, there are some tricks you could do with human-learn that can help as well. Part of labeling is a UI problem at times.

github.com/koaning/human-…
@mihagazvoda @abhi1thakur @rushworth_a In particular, you could try to do some "bulk labelling".

It's not a perfect trick (at all), but it may help you figure out clusters relatively quickly. It's explained in more detail here:

@mihagazvoda @abhi1thakur @rushworth_a I later updated the UI for the trick, explained here:

@mihagazvoda @abhi1thakur @rushworth_a In general, if folks have never labelled themselves before, please do so! I have so many anekdotes on how it saved the day.

In particular, I might recommend everyone to give this gem a glance:
explosion.ai/blog/supervise…
@mihagazvoda @abhi1thakur @rushworth_a With so many bad labels in benchmarks, it's getting too easy to be optimal on paper but broken in reality.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Vincent D. Warmerdam

Vincent D. Warmerdam Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @fishnets88

12 Jan
Tonight I'll speak at @PyData Slovenia! If I'm not mistaken you're still able to join and I'll be there to share some tricks in NLP. If folks have questions upfront; ask 'em here and I'll try to prepare a demo.

meetup.com/PyData-Sloveni…
Stuff I hope to discuss: letter embeddings, bytepair embeddings, word embeddings, subword embeddings and floret embeddings.

I'll also discuss why you may not need any of them 😅 but there are some interesting parts about 'em.
If there's time I'll also give a demonstration of why you should remain skeptical of fairness mitigation techniques. It's for sure a noble effort of research, don't get me wrong, but there's ample evidence that they're not very effective.
Read 4 tweets
17 May 20
People sometimes ask me if I have advice on how learn all these data science tools. Here goes; read the documentation cover to cover.

1/n
I was backpacking when I was teaching myself python. At some point I knew that I was going to have a moment with no wifi while on a boat for 24 hours. I would have power but no wifi.
So I used this app called sitesucker to pull in all of the pandas documentation and I started reading it. I took plenty of breaks to enjoy the view but I got a lot of reading done.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(