@mihagazvoda@abhi1thakur@rushworth_a I think he's more talking about labeling a dataset for a proper use-case. The drawdata tool is more meant for educational purposes.
That said, I have done a lot of work on a suite of tools that totally help me label in my day-to-day work.
A thread!
@mihagazvoda@abhi1thakur@rushworth_a For starters, if you'd like to label ... possibly the lowest barrier to entry method is to check out pigeon for classification tasks. I didn't make the tool, but it's a nice jupyter widget that's quick to get started with.
@mihagazvoda@abhi1thakur@rushworth_a Other times you may have a huge unlabelled dataset. Maybe there's a few examples in that huge dataset that you'd to get more samples of. In that case ... simsity might help!
@mihagazvoda@abhi1thakur@rushworth_a Finally, there are some tricks you could do with human-learn that can help as well. Part of labeling is a UI problem at times.
@mihagazvoda@abhi1thakur@rushworth_a In general, if folks have never labelled themselves before, please do so! I have so many anekdotes on how it saved the day.
Tonight I'll speak at @PyData Slovenia! If I'm not mistaken you're still able to join and I'll be there to share some tricks in NLP. If folks have questions upfront; ask 'em here and I'll try to prepare a demo.
Stuff I hope to discuss: letter embeddings, bytepair embeddings, word embeddings, subword embeddings and floret embeddings.
I'll also discuss why you may not need any of them 😅 but there are some interesting parts about 'em.
If there's time I'll also give a demonstration of why you should remain skeptical of fairness mitigation techniques. It's for sure a noble effort of research, don't get me wrong, but there's ample evidence that they're not very effective.
People sometimes ask me if I have advice on how learn all these data science tools. Here goes; read the documentation cover to cover.
1/n
I was backpacking when I was teaching myself python. At some point I knew that I was going to have a moment with no wifi while on a boat for 24 hours. I would have power but no wifi.
So I used this app called sitesucker to pull in all of the pandas documentation and I started reading it. I took plenty of breaks to enjoy the view but I got a lot of reading done.