Vikash Sehwag Profile picture
PhD candidate @Princeton, working on robust machine learning. Former intern @MSFTResearch, @facebookai
Apr 25, 2022 4 tweets 2 min read
After extreme procrastination, I finally finished the blog post on "why we should use synthetic dataset in ML".

In other words, can we use all these amazing #dalle2 synthetic images to improve generalization.

vsehwag.github.io/blog/2022/4/sy… Training on synthetic+real data, tend to show an inflection point. As the quality of synthetic data improves, it will go from "degrading performance" -> "no benefit at all" -> "finally benefit in generalization". Diffusion models cross the inflection point on most datasets.