Latest Twitter Threads by @hunterjlang on Thread Reader App

Nov 29, 2022 • 7 tweets • 3 min read

Simple trick to improve weak supervision: prune your training data! Our embedding-aware pruning method can boost the accuracy of weak supervision pipelines by up to 19%, and it only uses a few lines of code!

Come by #NeurIPS22 poster 611 today at 4pm to hear more, or read on 🧵

Most existing weak supervision setups (Snorkel, etc.) use all the weakly-labeled data to train a classifier. But there's an intuitive tradeoff between coverage and accuracy of the weak labels. If we cover *less* training data w/ higher accuracy, do we get a more accurate model?

Share this page!

Enter URL or ID to Unroll