I glimpsed during my work in Ian Witten‘s lab in the 1990ies that one can extract arbitrary semantics from pure statistical text models. I just never considered this to be very interesting, because I thought AGI would result from intelligent self improvement, not mere imitation.
I expected AI would become smarter than us before it knew very much. Current frontier models are idiot savant AI: less intelligent than competent humans, but they memorized basically everything. The models are now improving through reasoning, reinforcement learning and tool use.
My bias against the big data approach to AGI is fueled by how I observe my own learning. I always try to avoid memorization and focus on re-deriving, using, improving and integrating what I read. Memorizers do worse than curious explorers and experimenters.
When I was young, my opposition to the unjust and destructive wars of Kissinger, the Bushs and Clintons, the crude and consumerist American culture led me to wish for Germany leaving NATO and Europe disengaging from US dominance.
Regardless for which politicians we elected, they would have to spend time in the US, go through the Council of Foreign Relations and be part of a cabal called
“Atlantikbrücke” before they would be allowed to play roles at the federal level. Germany is a vassal of the US empire.
The US could force us to bomb Serbia against international law, subvert the beginnings of Russian democracy under Gorbatchev and Yeltsin, accept glyphosate, antibiotics and growth hormones in our food. Echelon and XKeyscore spied on European citzens, leaders and industry.
To understand the self, it's necessary to deconstruct our experience of subjective identity, and our experience of what it means for an object to be identical with itself. Identity is a representation, not a reality. Some represented objects have identity, others not.
Things without identity are patterns. For instance, a pattern of six dots on the side of a die indicates the number six. We don't expect the pattern to be specific to the die, we only expect it to match. Everything that is more specific (eg. irregularities in the dots) is noise and irrelevant to the pattern itself.
Conversely, we expect the physical manifestation of the die to be concrete, a specific thing with an identity. The identity is a biography that completely ties the die into a very particular way into the history of the physical universe. Identities imply biographies.
I don't think that the scaling hypothesis gets recognized enough for how radical it is. For decades, AI sought some kind of master algorithm of intelligence. The scaling hypothesis says that there is none: intelligence is the ability to use more compute on more data.
If the scaling hypothesis is correct, it means that intelligence is not an algorithm, but a grab bag of many specialized skills, which can be acquired using any good enough learning procedure. Current AI researchers have strong disagreements about the limits of scaling.
It is not easy to make a compelling argument against scaling: if foundation models are worse than humans at eg. reasoning, will the problems go away if we train them on more reasoning examples? Perhaps there is a master algorithm for creative reasoning hidden in the data?
Nonconstructive mathematics is the claim that there exist mathematics that can do more than computation. In other words, there is a mathematical universe in which you might build a machine that can traverse an infinite graph in a finite amount of steps, that can splice a 3D ball into two 3D balls of the same size and that can simulate Hilbert's hotel.
Physicists who import their math and modeling library from nonconstructive mathematicians have to deal with a language that describes phenomena that are impossible in physics as if they were valid occurrences, and allows the construction of objects that could never be real.
If we understand physics as the attempt to discover the language of the universe, the compositional operations that lead to emergence and interaction of elementary particles, supervenience of spacetime, and the observable macrostates, shouldn't it use a constructive encoding?
I currently think that consciousness may be a learning algorithm for a self organizing information processing system, a colonizing pattern that entrains itself on a brain once it is discovered early in development. This is an unusual hypothesis and thus not unlikely to be wrong.
Perhaps we can think of consciousness in similar terms to the principle of government in human society. A primitive government is bullying people based on a personal reputation system and does not scale beyond the tribal level. Scalable governments require "recursive bullying".
Recursive governments are colonizing by imposing order that can harvest more energy than it costs to maintain the administration. Once the principle of recursive bullying is discovered, it spreads until it meets another government with similarly efficient implementation.