Text-to-SQL models should help non-experts easily query databases. But annotating examples to train them requires expertise (labeling NL utterances with SQL queries).
Can we train good enough models without any expert annotations?
2/5
Instead of gold SQL, we train text-to-SQL models on weak supervision: (1) answers & (2) question decompositions (annotated / predicted by a model) ⛏️
3/5
Using the database + question decomposition + answer we automatically synthesize a corresponding SQL query. This is well captured by mapping rules from the decomposition to SQL.
4/5
We test on five text-to-SQL benchmarks: (1) Weakly supervised models reach ~94% of those trained on gold SQL (2) Even models trained on few / zero in-domain decompositions still reach ~90% of the gold SQL ones