Super excited to talk tomorrow (July 30th, 3pm pacific) at Abralin ao Vivo about joint with Yuan Yang. I'll be presenting a long-running project on language acquisition that tackles language learnability questions with Bayesian program learning tools.
Our project studies how program learning tools can acquire natural language structures from positive evidence alone. We show that learners can *construct* grammatical devices for producing finite-state, context-free, and context-sensitive grammars to explain data they see.
Our interest in this started with @AmyPerfors's study showing that learners could discover that language is context-free from just a few minutes of child directed speech. cse.iitk.ac.in/users/cs671/20…
Amy et al. compared different grammars and showed that context free grammars provided the best explanation of child-directed speech, meaning that children could discover that language is (roughly) context-free by comparing hypotheses to find a simple theory of the data they hear.
Our work merges this general idea with inductive program learning ideas from @NickJChater and Vitanyi, who showed that statistical inference over Turing machines solves classic Gold-style learnability problems that long dominated language learning theory. homepages.cwi.nl/~paulv/papers/…
We show that now standard methods from program-based concept learning models can figure out what computation is generating the data the learner sees, using only positive evidence. The hypothesis space can be all computations. We look at learning a lot of formal languages...
All learned from positive evidence only. Same for a simple toy grammar of English (that is, the learning model constructs a program that generates the same strings as this)
And the languages used in a bunch of artificial language learning experiments.
This kind of program learning inevitably shows complex, structured patterns of generalization about data that hasn't been seen. That's not something unexpected or fancy or special about human language.
Implementing domain-general learning schemes like this helps to clarify a common confusion: the existence of domain-specific representations doesn't mean they result from domain-specific learning systems. So many times I've heard "Well structure X is only found in language so..."
But *the* thing that domain general learning systems do is create domain specific representations. That's what happens in math, reading, or driving. They all get domain specific representations from domain general mechanisms.
Isn't it harder to build in an infinite space of hypotheses like this model uses, rather than UG? Why build in many possible representations when we could build in just one? Here's my favorite part of the (upcoming) paper
So, you can learn many of the formal structures that have been argued to be necessary for language, using simple, domain-general tools. Maybe this will help to connect language acquisition to algorithm and concept learning found elsewhere in cognitive science.
It is an amazing time to work in the cognitive science of language. Here are a few remarkable recent results, many of which highlight ways in which the critiques of LLMs (especially from generative linguistics!) have totally fallen to pieces.
One claim was that LLMs can't be right because they learn "impossible languages." This was never really justified, and now @JulieKallini and collaborators show its probably not true:
One claim was that they LLMs can't be on the right track because they "require" large data sets. Progress has been remarkable on learning with developmentally-plausible data sets. Amazing comparisons spearheaded by @a_stadt and colleagues:
Yes, ChatGPT is amazing and impressive. No, @OpenAI has not come close to addressing the problem of bias. Filters appear to be bypassed with simple tricks, and superficially masked.
Yeah, yeah, quantum mechanics and relativity are counterintuitive because we didn’t evolve to deal with stuff on those scales.
But more ordinary things like numbers, geometry, and procedures are also baffling. Here’s a little 🧵 on weird truths in math.
My favorite example – the Banach-Tarski paradox – shows how you can cut a sphere into a few pieces (well, sets) and then re-assemble the pieces into TWO IDENTICAL copies of the sphere you started with.
It sounds so implausible, people often think they've misunderstood. But it's true -- chop into a few "pieces" and reassemble to two *identical* (equal size, equal shape) spheres to what you started with.
Everyone seems to think it's absurd that large language models (or something similar) could show anything like human intelligence and meaning. But it doesn’t seem so crazy to me. Here's a dissenting 🧵 from cognitive science.
The news, to start, is that this week software engineer @cajundiscordian was placed on leave for violating Google's confidentiality policies, after publicly claiming that a language model was "sentient" nytimes.com/2022/06/12/tec…
Lemoine has clarified that his claim about the model’s sentience was based on “religious beliefs.” Still, his conversation with the model is really worth reading: cajundiscordian.medium.com/is-lamda-senti…