OK people, I did the thing. #ChatGTP can hallucinate relational databases. With full credit to Jonas Degrave's creative prompt for hallucinating a Linux prompt.
I had to push it harder to generate some long-tail real-world data, but it got there more or less. In this case, I was looking for high-end handmade trumpets. Took a few tries.
These are real trumpet manufacturers!
And finally we're getting the kind of stuff I was looking for. David Monette is the most famous independent trumpet craftsman. Warburton is another. Kanstul and Edwards also arguably qualify in some fashion. The rest are bigger brands. This is some long-tail data!
The gendered prompt was intentional, actually -- brass instrument manufacturing is a male-dominated field and I figured I'd play to the empirical biases to get the data I was looking for.
It's not so good at composing complicated SQL queries on this database though.
If I write a (buggy) version of my query it produces something that looks sensible (tho the prices are about 4x too low). The query has both logic and syntax errors, but ChatGPT figured out more or less what I meant.
Prices are ~40x too low. (Man, I am so unreliable, I don't think I'll ever achieve GI.)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
@WhoWillRickWill@sarahcat21@nikitonsky@adityagp 1. Data languages like SQL and Pandas have a limited set of type constructors (relations and dataframes respectively). This can make mapping general-purpose types into and out of these languages difficult.
@WhoWillRickWill@sarahcat21@nikitonsky@adityagp 2. Language agnostic data languages like SQL tend to have their own atomic type system with some idiosyncracies (e.g. numeric, timestamp) with special cross-type concerns (e.g. NULL) that can be inconsistent with programming languages, again making mapping difficult.
@WhoWillRickWill@sarahcat21@nikitonsky@adityagp 3. Programming styles for data languages (declarative logic for SQL, functional programming for Pandas) push some programmers outside their comfort zone. Also it's tricky for programmers (and compilers!) to decide what to "push into" data language expressions for efficiency.
@siobhcroo@alvinkcheung@mbpmilano Next, I gave a keynote at #POPL last week going over the foundations of this work. Super appreciative to that community for the opportunity. It's an hour talk, posted here: /3