In assembly theory (Sharma, Czégel, Lachmann, Kempes, Walker & Cronin, Nature 2023), the assembly index a of an object is the minimum number of recursive joining operations required to construct it from a basis set of elementary parts, where each intermediate is reusable once formed.
The framework was developed to distinguish biotic from abiotic matter: empirically, molecules with a ≳ 15 are not produced by undirected chemistry at detectable abundance, and their occurrence is treated as evidence of an underlying selection process a causal history capable of preserving and recombining intermediates.
The index is thus not a measure of static complexity but of contingent depth: the length of the shortest causal chain compatible with the object's existence.
A thread 🧵⬇️
2 /
The construction extends naturally to algorithm-space.
Treat the space of learning systems as an assembly space whose elementary operations are formal primitives (differentiable composition, attention, value iteration, policy gradients, in-context conditioning) and whose objects are trainable architectures.
Under this mapping, contemporary frontier systems occupy a regime of high a, reached through an ordered trajectory backpropagation (Rumelhart et al., 1986) → distributed representations → convolutional and recurrent inductive biases → the attention mechanism (Bahdanau et al., 2014) → the transformer (Vaswani et al., 2017) → neural scaling laws (Kaplan et al., 2020; Hoffmann et al., 2022) → RLHF (Christiano et al., 2017; Ouyang et al., 2022) → tool use and extended-context reasoning.
Each transition is conditionally near-zero-probability absent its predecessors. The trajectory is a constraint on reachability.
3 /
A parallel assembly path governs the physical substrate. Programmable shading hardware was repurposed for general-purpose matrix arithmetic (CUDA, 2007), specialized into tensor cores, and embedded in high-bandwidth interconnect fabrics (NVLink, InfiniBand) capable of maintaining gradient synchrony across 10^4–10^5 accelerators.
Algorithmic and hardware paths are mutually gating: the transformer is computationally inert without dense matmul throughput, and dense matmul throughput is economically unjustified without an architecture that consumes it.
The joint assembly index of the algorithm-hardware pair is therefore strictly greater than either component considered in isolation, and capability gains are bounded by the slower of the two trajectories.
4 /
This reframes the scaling debate.
The relevant question is not whether AGI requires a single missing insight or additional compute applied to existing methods, but which prerequisite constructions on the joint trajectory remain unrealized.
Candidate gaps include online continual learning without catastrophic interference, a memory architecture supporting selective consolidation, and a credit-assignment mechanism over horizons exceeding current context windows.
Each is plausibly gated by primitives not yet isolated, and the gating structure implies that compute applied to existing primitives yields diminishing returns once the local subtree of the assembly graph is exhausted.
Step-skipping is not available; the order is a property of the space, not of the researcher.
5 /
A caveat on the underlying theory.
The status of the assembly index relative to algorithmic information theory remains disputed.
Abrahão, Hernández-Orozco, Kiani, Zenil and colleagues (PLOS Complex Systems 2024) argue that the index is approximated by LZ-class compression and reducible to Shannon entropy under appropriate normalization.
Kempes et al. (npj Complexity 2025) reply that the index quantifies causation under selection rather than minimum description length, and note that exact computation of a is NP-complete, placing it in a distinct complexity class from polynomial-time compression schemes.
For the present argument the analogy is robust to this dispute: under either interpretation, capability sits behind an ordered sequence of constructions whose order is not optional.
The methodological implication is to model AGI not as a threshold crossed along a single scaling axis, but as an object with a construction history, and to direct research effort toward identifying the rate-limiting prerequisites on the joint algorithm-substrate path.
6 /
Conclusion
The framing recasts AGI forecasting as a problem in identifying unrealized prerequisites on a joint algorithm–substrate assembly graph, rather than as extrapolation along a compute axis.
The order of constructions is a property of the space, not a research preference, and step-skipping is not available.
If one accepts the assembly index as causally distinct from algorithmic complexity or treats it as a useful re-parameterization, the methodological conclusion is invariant: capability is gated by ordered prerequisites, and the rate-limiting question is which primitives remain to be isolated.
References below ⬇️
7 /
References
Sharma, Czégel, Lachmann, Kempes, Walker & Cronin (2023). Assembly theory explains and quantifies selection and evolution. Nature 622, 321–328. doi.org/10.1038/s41586…
Kempes, Lachmann, Iannaccone, Fricke, Chowdhury, Walker & Cronin (2025). Assembly theory and its relationship with computational complexity. npj Complexity 2, 27. doi.org/10.1038/s44260…
Abrahão, Hernández-Orozco, Kiani, Tegnér & Zenil (2024). Assembly Theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolution. PLOS Complex Systems 1(1), e0000014. doi.org/10.1371/journa…
Rumelhart, Hinton & Williams (1986). Learning representations by back-propagating errors. Nature 323, 533–536. doi.org/10.1038/323533…
Bahdanau, Cho & Bengio (2014). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. arxiv.org/abs/1409.0473
An open-source, first-principles theoretical reconstruction of Claude Mythos, implemented in PyTorch.
The architecture instantiates a looped transformer with a Mixture-of-Experts (MoE) routing mechanism, enabling iterative depth via weight sharing and conditional computation across experts.
My implementation explores the hypothesis that recursive application of a fixed parameterized block, coupled with sparse expert activation, can yield improved efficiency–performance tradeoffs and emergent multi-step reasoning.
Learn more ⬇️🧵
2 /
I hypothesize that Mythos is a Recurrent-Depth Transformer (RDT) a class of looped transformer in which a fixed set of weights is applied iteratively across T loop steps within a single forward pass.
Crucially, reasoning occurs entirely in continuous latent space. There is no intermediate token emission between steps. This is structurally distinct from chain-of-thought and has been formally analyzed (Saunshi et al., 2025; COCONUT, 2024).
3 / 7
The recurrent block executes one shared TransformerBlock for up to T=16 loop iterations. At each step, the frozen encoded input e is re-injected via a stable LTI update rule: h_{t+1} = A·h_t + B·e + Transformer(h_t, e)
The FFN inside this block is a Mixture-of-Experts layer, following DeepSeekMoE's design a large pool of fine-grained routed experts, with only a sparse top-K subset activated per token, alongside a small set of always-active shared experts that absorb common cross-domain patterns.
Critically, the router is selecting distinct expert subsets at each loop depth meaning every iteration is not merely a repetition, but a computationally distinct pass. MoE provides domain breadth; looping provides reasoning depth.
A thread 🧵on my vast arrays of essays on economics
Like, retweet, and share this with friends
1 /
A Theory on Value Creation
I wrote "A Theory on Value Creation" to bridge the gap between traditional economic models and the contemporary economic landscape, where innovation, networks, human capital, and technological advancements play pivotal roles in value creation. This paper formalizes a comprehensive framework for understanding how both tangible and intangible resources interact with technology and time to generate value. It integrates theoretical rigor with practical applications across microeconomic, macroeconomic, and sector-specific contexts.
This paper introduces an unique approach to economic modeling, where economic systems are conceptualized as intelligent neural networks. By treating economic agents—such as individuals, firms, and governments—as neurons in a neural network, this framework reveals how economies can learn, adapt, and self-organize over time. Through formal mathematical models and a series of theorems, this paper explains how market dynamics can be optimized, how economies recover from crises, and how policy interventions can guide systems toward stability.
Introducing Search Arena – The Ultimate Platform for Evaluating Search-Based Web Agents! 🕵️♂️🔍
Having reliable search tools is more critical than ever. But, finding the best search-based web agents can be challenging. That's why we built Search Arena.
There are countless search-based web agents available, but how do you know which one performs the best? The quality and efficiency of these agents can vary widely, making it tough to choose the right one for your needs. 😕
Search Arena is designed to rigorously evaluate and compare these agents using a variety of metrics. We ensure you can identify the most effective solutions to optimize your search capabilities.
Introducing the Python Documentation Generator Agent, an advanced tool designed to revolutionize the way we handle documentation. Learn how this agent can save thousands of hours by automating the documentation process for your Python projects.
Writing detailed and comprehensive documentation is a time-consuming task. Our agent simplifies this by automatically generating high-quality, multi-page professional documentation tailored to your code's unique structure and functionality.
With the Python Documentation Generator Agent, you can focus on what matters most: coding. The agent takes care of everything from providing class definitions and parameter descriptions to offering extensive usage examples and tips.