phd student at @MIT_CSAIL working in self/weak supervision, nlp with @David_Sontag. he/him
Nov 29, 2022 • 7 tweets • 3 min read
Simple trick to improve weak supervision: prune your training data! Our embedding-aware pruning method can boost the accuracy of weak supervision pipelines by up to 19%, and it only uses a few lines of code!
Come by #NeurIPS22 poster 611 today at 4pm to hear more, or read on 🧵
Most existing weak supervision setups (Snorkel, etc.) use all the weakly-labeled data to train a classifier. But there's an intuitive tradeoff between coverage and accuracy of the weak labels. If we cover *less* training data w/ higher accuracy, do we get a more accurate model?