Latest Twitter Threads by @VSehwag_ on Thread Reader App

Apr 25, 2022 • 4 tweets • 2 min read

After extreme procrastination, I finally finished the blog post on "why we should use synthetic dataset in ML".

In other words, can we use all these amazing #dalle2 synthetic images to improve generalization.

vsehwag.github.io/blog/2022/4/sy…

Training on synthetic+real data, tend to show an inflection point. As the quality of synthetic data improves, it will go from "degrading performance" -> "no benefit at all" -> "finally benefit in generalization". Diffusion models cross the inflection point on most datasets.

Share this page!

Enter URL or ID to Unroll