PhD candidate @Princeton, working on robust machine learning. Former intern @MSFTResearch, @facebookai
Apr 25, 2022 • 4 tweets • 2 min read
After extreme procrastination, I finally finished the blog post on "why we should use synthetic dataset in ML".
In other words, can we use all these amazing #dalle2 synthetic images to improve generalization.
vsehwag.github.io/blog/2022/4/sy…
Training on synthetic+real data, tend to show an inflection point. As the quality of synthetic data improves, it will go from "degrading performance" -> "no benefit at all" -> "finally benefit in generalization". Diffusion models cross the inflection point on most datasets.