Today on the blog I’ve started a new mini-series called “Language, Statistics, & Category Theory” to describe some ideas my collaborators and I share in a recent paper on mathematical structure in language. Part 1 is now live! math3ma.com/blog/language-…
We open with the idea that language is algebraic: you can “multiply”words together (concatenation) to get new expressions:
red × firetruck = red firetruck
I’ve mentioned this idea previously in a "promo video" I made for my PhD thesis last year:
Now, thinking algebraically, consider this famous quote by linguist John Firth: “You shall know a word by the company it keeps.” If you’re an algebraist, you may try to formalize this by identifying the meaning of a word, like “red,” with the principal ideal associated to it:
Algebra is nice, but it isn’t the full story. There’s also statistics! “Red firetruck” occurs more frequently than “red idea,” and this contributes to the meaning of the word “red.” So, how can we marry these structures? It turns out category theory is a nice setting for this.
To start, language is a category! Objects are strings of words, and arrows indicate when one string is contained in another. This category is a bit like syntax: it tells us “what goes with what.”
What can we do with this category? Well, we might wonder what perspective category theory’s most important theorem—the Yoneda lemma—brings. Informally, this theorem states that a mathematical object is uniquely determined by its networks of relationships. math3ma.com/blog/the-yoned…
So in the spirit of the Yoneda lemma, the meaning of a word like “red” is contained in the network of ways that "red" fits into all other expressions in English. Sounds a bit like John Firth's quote, no?
The passage from “red” to “the network of ways 'red' fits into language” is described formally in category theory as a functor. It takes us from a syntax category of language to the category of “copresheaves” on it, where semantics may lie. Sounds fancy, but the idea is simple!
Okay, why do all this work? It turns out there are advantages to thinking category theoretically, rather than merely algebraically, including a principled way to incorporate statistics. I’ll explain more in Part 2. Stay tuned! (Or, read the preprint! arxiv.org/abs/2106.07890)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I’m happy to share a new paper with @MStoudenmire and John Terilla: arxiv.org/abs/1910.07425 We share a tensor network generative model, a deterministic training algo & estimate generalization error, all with the clarity of linear algebra. Now on the blog! math3ma.com/blog/modeling-…
@MStoudenmire I’ll explain some of the ideas here, sticking to a "lite" version. (Check out the paper for the full version!)
If you’ve been following the posts on Math3ma for the past 6-or-so months, you’ll be delighted to know the content is all related. But more on that later…
@MStoudenmire Alright, before jumping in, let’s warm up with a question:
Every probability distribution can be viewed as a quantum state & vice versa. There's a nice mathematical dictionary between the two worlds! So, what *is* a quantum state? And what's the dictionary? "A First Look at Quantum Probability, Part 2" is here! math3ma.com/blog/a-first-l…
I’ll share a few of the ideas here, picking up where we left off in Part 1:
Hello friends! I’m excited to share with you the start of a mini-series on quantum probability theory. It's a *first* look at the subject, so the only prerequisites are linear algebra and basic probability. Part 1 is now on Math3ma! math3ma.com/blog/a-first-l…
Part 1 motivates the mini-series by reflecting on a thought from the world of (classical) probability theory:
*Marginal probability doesn’t have memory.*
What do I mean?
From a joint probability distribution on a product of two sets, you can get marginal probabilities by summing over, or “integrating out,” one of the variables. But marginalizing loses information—it doesn’t remember what was summed away!