I think one of the conclusions we should draw from the tremendous success of LLMs is how much of human knowledge and society exists at very low levels of Kolmogorov complexity.
We are entering an era where the minimal representation of a human cultural artifact... (1/12)
...will be, generically, an LLM prompt. And those prompts will be, generically, orders of magnitude more compact than the artifacts themselves. The great success of coding agents, for instance, indicates that the source code of most software artifacts is orders of... (2/12)
...magnitude more bloated than the truly minimal algorithmic representation required to specify that software artifact unambiguously. Likewise for much of human writing, research, communication. By being such efficient decompressors of algorithmic information, LLMs have... (3/12)
...betrayed the horrifying extent of our own verbosity. Part of that verbosity doubtless arises from the limitations of our formal representation languages (such as programming languages). But part of it also seems inherent, likely as a means of human error-correction. (4/12)
When the intended decompressor is very lossy (like a human mind), overspecifying the representation with lots of synonyms and syntactic sugar seems prudent. When the intended decompressor is closer to perfectly lossless (as LLMs are rapidly becoming), it makes less sense. (5/12)
Mathematics and physics represent interesting test cases. The process of axiomatization in mathematics is a form of algorithmic compression: all the true theorems are always "contained" in the representation of the axioms and the rules of inference, but the process of... (6/12)
...decompressing this representation can be arbitrarily difficult. Yet the details of how the decompression (theorem-proving) and compression (reverse mathematics) processes happen are, in some sense, the true objects of mathematical interest. Likewise with physics. (7/12)
One might, if one were sufficiently naive, claim that physics is about finding minimal algorithmic compressions of the physical universe. Yet again, the details of (de)compression are ultimately what matter. Merely finding a minimal representation of the universe... (8/12)
...wouldn't "solve physics", any more than discovering the ZFC axioms "solved mathematics". [If one believes, as I do, that the universe can ultimately be modeled in computational terms, then in a sense this representation already exists: it's a universal Turing machine.] (9/12)
LLMs are remarkably effective decompressors of algorithmic information, and their success in theorem-proving and software development is a testament to that. Their capabilities in compression currently seem less clear. Yet discovering minimal representations,... (10/12)
...be they witty aphorisms or bon mots (at which present LLMs are uniformly awful), or the compressed axiomatic representations that characterize mathematical beauty (at which present LLMs are largely untested), constitutes one of the hallmarks of deep human intelligence. (11/12)
So I think it's becoming increasingly clear that efficiency and losslessness, across both compression and decompression, together represent four potential axes along which we can begin to parameterize the space of possible (intelligent) minds.
But what are the others? (12/12)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
What if you could guarantee (using a mix of formal verification and PDE theory) that a neural network would *always* give you the correct answer, even when making inferences arbitrarily far away from the training data?
Introducing BEACONS. arXiv link below. (1/15)
Link:
In physics, we often want to use neural networks to infer new solutions to systems of PDEs. But outside of the spatiotemporal ranges on which they're trained, even physics-informed neural networks (PINNs) struggle to extrapolate correctly. (2/15)arxiv.org/abs/2602.14853
Back in the 90s, excellent work was done by Mhaskar, Pinkus and others on *quantitative* versions of the fêted Universal Approximation Theorems for neural networks: how accurately can a shallow neural network with N hidden neurons approximate a d-dimensional function? (3/15)
I remember this moment very vividly. Apologies in advance for the arrogant and self-indulgent personal anecdote...
I originally went to university with the intention of becoming a pure mathematician, and for the first couple of years this dream seemed to be pretty safe. (1/10)
I generally came top (or near top) in exams, sat in on graduate-level courses, tackled some research problems, published some papers. I'd convinced myself that I could understand any mathematical structure if I just wrote down the rules and stared at them for a bit. (2/10)
Then, in third year, I sat in on a graduate algebraic number theory course. I'd never had a particularly great interest/intuition for number theory, but I had a pretty good facility for rings/modules/etc., and as the course went on I found myself leaning more and more… (3/10)
The real reason for building it, beyond it just being a fun algorithmic puzzle, was because my then-collaborator @stephen_wolfram and I wanted to perform an empirical investigation: to systematically enumerate possible axiom systems, and see what theorems were true. (2/15)
@stephen_wolfram We constructed an enumeration scheme, set the theorem-prover running, and waited. Unsurprisingly, most axiom systems we encountered were completely barren. But occasionally we'd see one we recognized: group theory shows up around no. 30,000, Boolean logic at ~50,000, etc. (3/15)
The tensor product ⊗ is, conceptually, the most general (binary) operation that behaves "how a product should behave". In practice, this means that the order of brackets shouldn't matter, i.e. X⊗(Y⊗Z) should be the same as (X⊗Y)⊗Z, for any objects X, Y and Z... (2/9)
...and also that there should be some "do nothing" object I, such that X⊗I and I⊗X should both be the same as just X, for any object X. Any operation obeying these rules (including just ordinary multiplication of numbers) is therefore a kind of "tensor product". (3/9)
Topologically, of course, it has 1: it's homeomorphic to a punctured disk. But intuitively it has 2: one at the top and one at the bottom. And this answer lies at the heart of the most rigorous axiomatization of quantum field theory. (1/20)
In this intuitive picture, the two "holes" of the straw are 1-dimensional circles, and they're connected by a 2-dimensional cylinder (the straw itself). Mathematically, this relationship is called a "cobordism". Two n-dimensional manifolds are "cobordant" if they form... (2/20)
...the boundary of some n+1-dimensional manifold (like the two circles forming the boundary of the cylinder). And cobordisms give one a natural framework for thinking about time evolution in physics. Suppose we have two moments in time: t1 and t2, with t1 < t2. (3/20)
The desiccated "Theorem, Lemma, Proof, Corollary,..." presentational style is staggeringly counterproductive, if one's objective is actually communicating the underlying mathematical intuitions and thought processes behind a result. In reality, the process is more like... (1/4)
"First, I tried <standard method>, but it failed for <enlightening reason>, so I investigated whether I could exploit this fact to find <counterexample> with <property>, but all objects obtained through this technique ended up having <interesting property> in common.... (2/4)
...So I tried relaxing <axiom> to see whether <related property> could be removed, and this led me to realize that <intermediate lemma> is actually crucial to the structure of <related object>..." Etc. You occasionally get these insights from (very good) mathematical talks. (3/4)