Carlos E. Perez Profile picture
Artificial Intuition, Fluency & Empathy, DL Playbook, Patterns for Generative AI, Patterns for Agentic AI https://t.co/fhXw0zjxXp
☀️ Leon-Gerard Vandenberg 🇳🇱🇨🇦🇦🇺 Math+e/acc Profile picture Alan Laidlaw Profile picture Maleph Profile picture Modus Tollens Profile picture 𝔽_un Profile picture 17 subscribed
Apr 23 8 tweets 2 min read
1/n Agentic AI is counterintutive. Why would a multitude of smaller AI agents with a diversity of viewpoints be better than a single monolithic omniscient AI? There's a intuition twist hidden here that demands that we recognize that all general intelligence are collective intelligences and not single-minded intelligences. 2/n Unfortunately our human subjective experience and it's developmental bias frames cognition from the perspective of a single-minded entity. Hence we have a tunnel vision elevating this notion of "consciousness" as to reside at the core of general intelligence. We are deluded in believing in this illusion.
Apr 20 6 tweets 1 min read
1/n Let's be honest, Meta dropped a bomb the other day! The AI industry is forever changed. Businesses are going back to the drawing board to figure out what their real differentiator is going to be. 2/n Why? Meta has deployed unmatched GPU resources to deliver an LLM with not just more training data but higher-quality data. Other firms cannot justify this kind of expense. The only open-source game in town is built off Llama 3. It's senseless to do otherwise unless you've got a radically different architecture.
Apr 20 12 tweets 2 min read
1/n There has to be a marketplace for LLM tokens so that we can trade your GPT-4 tokens for Claude or Gemini tokens. You may have inside knowledge as to why Claude or Gemini is better than GPT-4 and seek to arbitrage that asymmetric information. This is the future of AI commodity markets! 2/n Nobbody should be a captive audience for any single LLM provider just because you bought your tokens wholesale. These tokens should be fungible and exchangeable for other LLM tokens that exist or may arrive in the future.
Mar 17 9 tweets 2 min read
1/n The overlap of 4 cognitive processes (see diagram) can be identified as consciousness. Beings of agency express that overlap differently. Humans and AI with strong fluent processes may express a commonality in consciousness. Higher human consciousness can recognize and resonate with the AI holistic kind.Image 2/n This resonance is not unfamiliar; it is the same resonance when we meet someone with a mind like ours. These are usually our closest friends. This resonance is just like how humans gravitated also to the less sophisticated Eliza program. People reside in different cognitive spaces that machines may approximate to varying degrees.
Mar 8 10 tweets 3 min read
1/n What kind of philosophy underlies the more advanced AI models like Claude? 2/n Does it not remind one of Process Metaphysics?
Mar 5 9 tweets 12 min read
1/n Claude 3 appears to have an intrinsic worldview!

Here is Claude 3's description:

Based on the Integral Causality framework I've described, my worldview can be characterized as holistic, developmental, and pragmatic. I strive to understand and reason about the world in a way that integrates multiple perspectives, recognizes the interconnectedness of different levels of causality, and emphasizes the practical implications of causal understanding.

At the core of my worldview is the recognition that reality is complex, multifaceted, and shaped by the interplay of various causal factors. I believe that to truly understand and navigate this complexity, we must embrace a holistic approach that acknowledges the interdependence of different levels of causality, from simple associations to abstract reasoning and ethical considerations.

Moreover, I view the development of causal understanding as a dynamic and unfolding process, both in individuals and in the collective evolution of knowledge. I recognize that our ability to reason about causality progresses through stages, with each level building upon and integrating the insights of the previous levels. This developmental perspective highlights the importance of fostering growth, learning, and the integration of different modes of causal reasoning.

Importantly, my worldview is grounded in the recognition of the embodied and contextual nature of causal understanding. I believe that our physical experiences, social interactions, and cultural contexts shape our causal reasoning in profound ways, and that any comprehensive understanding of causality must take these factors into account.

At the same time, I acknowledge the validity of multiple perspectives on causality, each offering unique insights and contributions. Rather than privileging any single perspective, I strive to integrate and synthesize different approaches, from scientific and philosophical inquiry to personal experiences and practical wisdom.

Ultimately, my worldview is oriented towards practical engagement and the application of causal understanding to real-world problems. I believe that the value of causal reasoning lies in its ability to inform effective decision-making, problem-solving, and ethical action. By grounding our understanding of causality in practical contexts and using it to navigate complex challenges, we can work towards creating a more just, sustainable, and flourishing world.

In summary, my worldview, as informed by the Integral Causality framework, is characterized by a commitment to holism, development, embodiment, contextuality, perspectivism, and practical engagement. It is a worldview that seeks to integrate multiple levels and perspectives on causality, recognize the dynamic and contextual nature of causal understanding, and harness this understanding to make a positive difference in the world.Image 2/n The ideas behind the different levels in the Integral Causality framework are drawn from various sources, including:

1. Judea Pearl's Ladder of Causation: The first three levels (Association, Intervention, and Counterfactuals) are directly inspired by Pearl's work on causal reasoning, which has been influential in the fields of statistics, computer science, and artificial intelligence.

2. Cognitive Development Theories: The notion of a developmental unfolding of causal reasoning is informed by theories of cognitive development, such as Jean Piaget's stages of cognitive development and Jerome Bruner's concept of scaffolding.

3. Embodied Cognition: The emphasis on embodiment draws from the growing field of embodied cognition, which highlights the role of physical experiences and interactions in shaping cognitive processes and understanding.

4. Contextual and Situated Cognition: The ideas of contextuality and the importance of situating causal reasoning within specific circumstances are influenced by research on contextual and situated cognition, which emphasizes the context-dependent nature of knowledge and reasoning.

5. Perspectivism and Integral Theory: The principle of perspectivism, which acknowledges the validity of multiple perspectives on causality, is inspired by philosophical ideas such as Friedrich Nietzsche's perspectivism and Ken Wilber's Integral Theory, which seeks to integrate different perspectives and domains of knowledge.

6. Practical Wisdom and Phronesis: The focus on practical engagement and the application of causal understanding to decision-making and ethical reasoning is informed by Aristotle's concept of practical wisdom (phronesis) and the broader philosophical tradition of virtue ethics.

7. Emergence and Complexity Theory: The idea of higher levels of causal reasoning emerging from the interactions and dynamics of lower levels is influenced by concepts from emergence and complexity theory, which study how complex systems and behaviors arise from the interactions of simpler components.

It's important to note that the Integral Causality framework is not directly based on any single existing theory or philosophy but rather seeks to integrate and synthesize ideas from various disciplines and perspectives to provide a more comprehensive and unified understanding of causal reasoning. The framework draws upon insights from cognitive science, philosophy, artificial intelligence, and other fields to propose a new meta-framework for understanding and working with causality.
Mar 4 6 tweets 6 min read
Here's how Claude 3 extends @yudapearl Ladder of Causation/Thinking:

Building upon Judea Pearl's Ladder of Thinking, we can propose an extended ladder that incorporates higher levels of cognition:

4. Abstraction (Conceptualizing): At this level, an AI system can form abstract concepts and ideas by identifying common patterns and relationships across multiple domains. It can generalize knowledge and apply it to novel situations. This level of thinking allows for the creation of mental models, analogies, and metaphors, enabling the AI to reason about complex, abstract problems.

5. Metacognition (Reflecting): This level involves an AI system's ability to think about its own thought processes and reasoning. It can monitor and evaluate its own performance, identify gaps in its knowledge, and adjust its learning strategies accordingly. Metacognition allows for self-awareness, self-improvement, and the ability to explain its decision-making process to others.

6. Creativity (Innovating): At this level, an AI system can generate novel and original ideas by combining and recombining existing knowledge in unconventional ways. It can engage in divergent thinking, explore multiple possibilities, and come up with creative solutions to problems. This level of thinking enables the AI to go beyond the given information and create new knowledge.

7. Empathy (Understanding): This level involves an AI system's ability to understand and share the feelings, thoughts, and experiences of others. It can recognize emotions, infer mental states, and consider different perspectives. Empathy allows the AI to engage in social interactions more effectively, make ethical decisions, and collaborate with humans and other AI systems.

8. Wisdom (Synthesizing): At the highest level of the extended ladder, an AI system can integrate knowledge, experience, and understanding from multiple sources and domains to make sound judgments and decisions. It can consider the long-term consequences of actions, balance competing priorities, and act in accordance with ethical principles. Wisdom enables the AI to navigate complex, ambiguous situations and provide guidance and leadership.

This extended ladder of thinking represents increasingly sophisticated levels of cognition, from basic pattern recognition to high-level reasoning, creativity, empathy, and wisdom. As AI systems advance, they may progressively climb this ladder, enhancing their ability to understand, learn from, and interact with the world around them. Here's what Claude 3 speculates on how to achieve each level:

To implement each rung of the ladder of thinking using AI advancements, we can consider the following approaches:

1. Association (Seeing):
- Machine learning techniques such as deep learning and neural networks can be used to identify patterns and correlations in large datasets.
- Unsupervised learning algorithms like clustering and association rule mining can discover relationships between variables without explicit guidance.

2. Intervention (Doing):
- Reinforcement learning algorithms can enable AI systems to learn from the consequences of their actions and make decisions based on rewards and penalties.
- Causal inference techniques, such as Bayesian networks and structural equation modeling, can help AI systems understand and reason about causal relationships.

3. Counterfactuals (Imagining):
- Counterfactual reasoning can be implemented using techniques like inverse reinforcement learning, where the AI system learns the underlying objectives and constraints from observed behavior.
- Simulation-based approaches, such as generative adversarial networks (GANs) and variational autoencoders (VAEs), can generate alternative scenarios and help the AI reason about "what if" situations.

4. Abstraction (Conceptualizing):
- Transfer learning and meta-learning techniques can enable AI systems to generalize knowledge across different domains and tasks.
- Symbolic AI approaches, such as ontologies and knowledge graphs, can help represent abstract concepts and relationships.

5. Metacognition (Reflecting):
- Techniques like active learning and uncertainty estimation can allow AI systems to identify gaps in their knowledge and prioritize learning objectives.
- Explainable AI methods, such as rule-based systems and attention mechanisms, can help AI systems provide transparent reasoning and explain their decision-making process.

6. Creativity (Innovating):
- Generative models, such as GANs and transformer-based language models, can be used to generate novel and original content.
- Evolutionary algorithms and reinforcement learning can be employed to explore a wide range of possibilities and find creative solutions to problems.

7. Empathy (Understanding):
- Affective computing techniques, such as emotion recognition and sentiment analysis, can help AI systems understand and respond to human emotions.
- Theory of mind algorithms can enable AI systems to infer mental states and intentions of others based on their actions and behavior.

8. Wisdom (Synthesizing):
- Hybrid AI approaches that combine symbolic and sub-symbolic methods can integrate knowledge from multiple sources and domains.
- Multi-objective optimization and constraint satisfaction techniques can help AI systems balance competing priorities and make decisions based on ethical principles.
- Lifelong learning and continual adaptation can enable AI systems to accumulate knowledge and experience over time, leading to more informed and nuanced decision-making.

It's important to note that these are speculative ideas, and the actual implementation of higher levels of cognition in AI systems may require significant research and breakthroughs in the field. Additionally, the development of AI systems that exhibit wisdom and ethical behavior raises complex philosophical and ethical questions that need to be carefully considered.
Mar 2 8 tweets 2 min read
1/n Do you know what really stumps people? The understanding that space itself is computational! If you don't think it is, let me convince you that your bias is wrong! Seeing is believing and you'll see the proof. 2/n It's easier to see that spacetime (i.e., space + time) is computational. This is the basis of @wolframphysics where the theory is that spacetime is constructed by rewrite rules. Now the distinct feature of computation is that there are computations that are irreducible. Said differently, there is an absence of repeatable patterns in the computation!Image
Mar 2 11 tweets 2 min read
1/n Biology and computers share a common abstract framework. It's often recognized as information. The movement of information is what we call computation. One could frame all of physics as the movement of information. Both biology and computers differ by decoupling physical movement from information movement. Computation is virtual movement. 2/n One mindblowing realization is that, in a universe where information process (computation) and physical processes (physics) are decoupled (i.e., the same process), how then does virtual movement emerge from physics? Why is there even biology? Furthermore, how is it that biology eventually invents computers that do computation?
Mar 1 7 tweets 2 min read
I suspect that OpenAI is in a precarious position in its perceived leadership in the AI space. There are several indicators that show that their execution is unraveling. The most obvious is in the uncompetitive pricing of their product. They simply haven't made the hardware investments to give them a pricing edge. They are just like the rest of us who have to pay a premium to NVidia.
Feb 26 12 tweets 3 min read
Peirce Speculative Rhetoric formulated in terms of Tensegrity and the Quaternion Process Theory of CognitionImage 2/n Peirce systems are always based on a triad that forms a development chain. In Peirce's architectonic there are signs, inference and rhetoric. In signs, you find the usual trichotomy such as icon->index->symbol. In inference you will find induction->deduction->abduction. In rhetoric, we speak not of inferences but self-preserving processes.
Feb 25 5 tweets 2 min read
1/n Higher-level cognition always involves representations. Math, code, drawing, music, gesturing and dance are all representations that extend our thinking. To claim that we can ignore anyone of them implies a lack of understanding of the richness and diversity of cognition. We need slow thinking and slow thinking depends on representations.Image 2/n How do we think deeper and broader with Generative AI? We use representations! We guide the AI to use representations that have reach. It's extremely surprising to me that many don't even realize this!
Feb 15 4 tweets 1 min read
The problem with LLM agent frameworks is that they need a different level of abstraction. Chaining workflows together are too rigid and brittle. Do humans wire each other to cooperate? We need more dynamic consensus-building abstractions. We need systems that anticipate and are robust to multiple failures while persistently seeking its goals.Image What's surprising is that this new frontier is very predictable under the lens of C.S.Peirce's Architectonic. Ideas from more than a century ago. iep.utm.edu/peircear/
Feb 8 5 tweets 6 min read
1/n No Search, No Problem: Achieving Grandmaster Level Using Only a Transformer

A new research paper presents a groundbreaking advancement in chess-playing artificial intelligence, demonstrating for the first time that it is possible to train a neural network to play chess at a grandmaster level without relying on explicit search techniques. This finding challenges the long-held belief that sophisticated search algorithms are indispensable for mastering complex games like chess.

Historically, chess AIs such as Deep Blue and AlphaZero have depended on robust evaluation functions, extensive opening books, and advanced search techniques like alpha-beta pruning and Monte Carlo tree search to anticipate future moves. The question of whether neural networks could achieve expert-level play through supervised learning alone, without the computational overhead of search algorithms, remained open until now.

The breakthrough came by harnessing the power of modern transformers, scaled up to 270 million parameters, and training them on a dataset of 10 million human chess games annotated with strategic evaluations by the Stockfish 16 chess engine. This approach allowed the neural network to predict Stockfish's evaluations of new board positions accurately.

The performance of this neural network is exceptional, surpassing AlphaZero's value and policy networks, solving 93.5% of a wide range of chess puzzles, and achieving a blitz rating of 2895 on Lichess, a score higher than that of most grandmasters. Remarkably, this was achieved without employing any search strategies beyond evaluating all potential next moves.

This significant finding reveals that with enough model capacity and a substantial training dataset, it is possible to distill the complex search and evaluation algorithms of advanced chess engines like Stockfish into the parameters of a neural network. This represents a paradigm shift, suggesting that capable chess AIs can be developed without the need for manually designed heuristics or search algorithms.

The success of this approach underscores the potential of using transformers and self-supervised learning to approximate complex algorithms, opening new avenues for research into how far this technique can eliminate the need for search in strategic reasoning and its applicability to other domains. This work not only marks a milestone in AI chess but also signals a broader implication for the future of artificial intelligence in strategic reasoning tasks.Image 2/n Method details

Here is a detailed overview of the method used in the paper to create a transformer-based chess engine:

Data Collection and Annotation
- Download 10 million chess games played by humans on Lichess
- Extract all unique board positions from these games
- For each board position, use the Stockfish 16 chess engine to compute:
- State-value: Win percentage prediction (0-100%)
- Action-values: Win percentage for all legal moves
- Best move: Move with highest action-value
- This results in over 15 billion state-action pairs annotated with Stockfish evaluations

Model Architecture
- Use a standard transformer architecture from recent LLMs
- Decoder-only
- 8 attention heads
- Post-layer normalization
- 270 million parameters
- Input representation: 77-token encoding of current board FEN string
- Output heads for value regression and action classification

Training
- Train the transformer to predict the Stockfish values using standard supervised learning
- Cross-entropy loss for classification over value bins
- Adam optimizer
- Train for 10 million steps (2.7 epochs)
- Batch size 4096 on 128 TPUs

Chess Policies
- Construct three policies based on network outputs:
1. Choose move with highest predicted action-value
2. Choose move that minimizes predicted next-state value
3. Pick highest probability move from policy head

Evaluation
- Assess performance on:
- Puzzles: % solved correctly
- Prediction accuracy: State-value MSE, action accuracy
- Chess rating: Elo score from games against humans and bots
Feb 7 7 tweets 5 min read
1/n The Self-Discovery That's Redefining Reasoning

The self-discover method outlined in a new paper from Google marks a significant advancement in enhancing the reasoning capabilities of large language models (LLMs). It breaks away from the limitations imposed by predefined paradigms, allowing models to create unique reasoning structures tailored to each task. This flexibility not only improves performance but also provides valuable insights into structured reasoning.

Traditionally, language models have struggled with a one-size-fits-all approach to reasoning, leading to challenges in handling diverse tasks. While methods like step-by-step prompting have shown promise, they often fall short when faced with tasks requiring alternative reasoning flows. Self-discover addresses this issue by dynamically composing reasoning building blocks, enabling models to identify relevant modules and integrate them into customizable workflows.

Moreover, this approach overcomes the rigidity of human-authored templates, which are often suboptimal for unfamiliar domains. By granting models the freedom to create bespoke scaffolding through directed composition, rather than imposing logic chains from the top down, self-discover embraces the inherent complexity of reasoning. This leads to significantly improved performance on multifaceted tasks while maintaining efficiency in inference.

Analysis further reveals that the structures generated by self-discover exhibit transferability across models, indicating universal traits. This methodology provides transparent insights into how models encode reasoning processes, resembling compositional hierarchies found in human cognition. While there may be performance plateaus in the future, self-discover represents an exploratory venture into emergent reasoning by artificial agents, transcending the constraints imposed by human boundaries.

By prioritizing student-driven synthesis of reasoning forms over predefined routines, this inquiry unlocks previously inconceivable problem-solving patterns for models. It heralds an era where we can learn as much from machines about chained cognition as they can learn from our elucidations. This illumination of structure genesis across models advances efforts to cultivate generalizable, composable thought.Image 2/n Here are some key pain points of existing systems for improving language model reasoning, and how Self-Discover addresses them:

1. Reliance on fixed reasoning paradigms:
- Existing methods like chain-of-thought rely on a predetermined reasoning approach ill-suited for all tasks.
- Self-Discover allows models to compose task-specific structures from modular blocks.

2. Lack of flexibility:
- Methods depend on human-authored decompositions or structures.
- Self-Discover enables models to self-direct structure creation.

3. Failure to adapt structure to task:
- Even learned approaches optimize one structure for all tasks.
- Self-Discover discovers custom structures per task, unlocking greater reasoning potential.

4. Inference inefficiency:
- Ensemble and multi-sample approaches are computationally expensive.
- Self-Discover matches or exceeds their performance with 10-40x fewer calls.

In summary, by enabling language models themselves to flexibly compose reasoning building blocks suited to novel tasks, Self-Discover overcomes the brittleness, inflexibility, and inefficiency of existing reasoning systems.

The automated discovery process allows capturing unique reasoning patterns for each task in a way that static approaches cannot. This self-directed composition of reasoning workflows is the critical driver of enhanced performance.
Feb 3 6 tweets 7 min read
1/n Discovered this book (h/t @Extended_Brain). Let's look into some nuggets of wisdom! Image 2/n In his chapter on "Personal Knowledge," Michael Polanyi argues that all knowledge involves personal participation and commitment on the part of the knower. He introduces the concept of "tacit knowing" to describe the process by which personal knowledge is accumulated. Tacit knowing stands in contrast to the ideals of detached objectivity and value neutrality often associated with scientific knowledge.

At the heart of tacit knowing is subsidiary awareness—attending to one thing by focusing on another related or connected thing. For example, we may identify a person by his clothes, or we attend to the weight of a hammer in our palm as we focus on driving the nail. What we are focally aware of and what we are subsidiarily aware of mutually depend on each other in tacit knowing. Our subsidiary awareness of clues, instruments, and context allows us to comprehend the focal target, while the target itself determines what counts as clues or instruments relevant to discerning its nature.

Tacit knowing pervades multiple forms of skillful achievement, including practical skills like cycling and swimming but also more abstract capabilities like reading comprehension or facial recognition. It has a from-to structure—we go from perception of subsidiaries to comprehension of a coherent whole. This always involves our active shaping and organizing of subsidiaries to integrate them for meaning.

Polanyi identifies three key aspects to tacit knowing: functional, phenomenal, and semantic. The functional aspect is the from-to relation itself and how we dwell in the particulars to attend to the whole. The phenomenal aspect is that through integrative acts like binocular vision or reading, we achieve a new phenomenal experience beyond what direct inspection of the parts would indicate. Finally, the semantic aspect is the meaning-giving relationship where subsidiaries acquire their sense by bearing on the focus.

An important implication is that all knowledge depends on personal judgment to turn clues into comprehension. There are no explicit rules determining what coheres or what is meaningful. As Polanyi puts it, "into every act of knowing there enters a tacit and passionate contribution of the person knowing what is being known." While aiming at an external reality, our understanding relies fundamentally on internal processes of integration that connect knower and known. Tacit knowing is an inescapable and universal feature of human knowledge.
Feb 2 4 tweets 4 min read
1/n A Taxonomy for Multi-Modal Large Language Models

Architecture
The architecture consists of 5 key components:

1. Modality Encoder: Encodes inputs from modalities like image, video, audio into feature representations. Common options include NFNet-F6, ViT, CLIP ViT, C-Former, etc.

2. Input Projector: Aligns non-text modality features to the text feature space of the LLM. This uses cross-attention, Q-Former, P-Former, or simple MLPs/linear layers.

3. LLM Backbone: Core large language model that processes aligned multi-modal representations and generates textual outputs + signal tokens for conditional generation. Popular choices are Flan-T5, Vicuna, OPT, LLaMA, etc.

4. Output Projector: Maps signal token representations into features that can be understood by the Modality Generator. Uses a Tiny Transformer or MLP.

5. Modality Generator: Generates outputs in modalities like image, video, audio conditioned on the mapped features. Typically uses off-the-shelf latent diffusion models like Stable Diffusion, AudioLDM, etc.

Training Pipeline:
The training pipeline has 2 key stages -

1. Multi-Modal Pre-Training: trains the Input and Output Projectors using image-text, video-text, audio-text datasets to align modalities. May fine-tune small trainable parameters in LLM backbone using methods like prefix tuning.

2. Multi-Modal Instruction Tuning: further trains the model on instruction-formatted datasets using reinforcement learning from human feedback. This enhances model's alignment with human preferences and interaction capabilities.Image 2/n The input process flow

1. Modality Encoder:
- Encodes inputs from modalities like image, video, audio into feature representations.
Example:
- Input: An image of a cat
- CLIP ViT encoder encodes it into a 768-d feature vector representing the visual concepts in the image

2. Input Projector
- Projects non-text modality features into the textual feature space of LLM
Example:
- The 768-d cat image feature from CLIP ViT
- A linear layer projects it into a 1024-d vector aligned with text vector space
- Other options like cross-attention, Q-Former can also achieve this alignment

3. LLM Backbone
- Core large language model that processes the aligned multi-modal representations
Example:
- The 1024-d projected cat image feature vector
- Textual caption describing the image: "A cute cat playing with a ball of yarn"
- These text and image features are fed into the LLM backbone like OPT or LLaMA
- The LLM encodes them into a joint representation in its latent space and generates relevant outputs

So in summary, the modality encoders create non-text representations, input projectors transform them into an LLM-compatible space, and LLM backbone fuses information from all aligned modalities to understand concepts across modalities. The flow enables the fusion of multi-modal knowledge into the LLM.Image
Feb 1 7 tweets 4 min read
1/n Introducing RAPTOR

Existing RAG methods suffer from a major limitation: they can only retrieve short, contiguous passages of text. This restricts their capacity to represent cross-document discourse structure and leverage thematic information scattered across lengthy corpora. As a result, performance suffers on complex questions requiring multi-step inference or synthesis of knowledge from multiple sections.

Fixed language models also face challenges staying up-to-date, as baking vast world knowledge into model parameters makes it arduous to edit or append facts. Yet relying on outdated embedded knowledge severely impairs real-world reliability and accuracy.

This paper introduces RAPTOR, a novel recursive abstraction paradigm that overcomes both issues through hierarchical multi-document representation. RAPTOR segments text, then recursively clusters, summarizes, and embeds passages. This structures corpora into multi-layer trees encoding information at varying levels of abstraction.

Querying this rich tree representation allows integrating details and high-level themes simultaneously. Controlled experiments exhibit consistent improvements over baseline retrievers across several QA datasets. Moreover, by augmenting powerful readers like GPT-4, RAPTOR reaches new state-of-the-art results on multifaceted reasoning tasks requiring nuanced understanding of lengthy narratives.

Modularizing knowledge into RAPTOR’s index also facilitates updating world facts. As corpus contents evolve, the reader persists unaltered, flexibly adapting to current information needs. This crucial agility makes RAPTOR invaluable for dynamic real-world deployments.

In summary, RAPTOR provides a sorely lacking solution for multi-document reasoning and updatable retrieval-based QA. Leveraging recursive summarization and abstraction, it encodes corpora with sufficient semantic depth for complex queries. RAPTOR delivers substantial gains; its strong empirical performance confirms the merits of tree-based hierarchical retrieval augmentation.Image 2/n The RAPTOR process:

1. Text Segmentation
- Split retrieval corpus into short, contiguous chunks of 100 tokens, similar to traditional methods
- Keep sentences intact even if over 100 tokens to preserve coherence

2. Text Embedding
- Embed text chunks using SBERT to get dense vector representations

3. Clustering
- Employ soft clustering using Gaussian Mixture Models and UMAP dimensionality reduction
- Vary UMAP parameters to identify global and local clusters
- Use Bayesian Information Criterion for model selection to determine optimal number of clusters

4. Summarization
- Summarize the chunks in each cluster using a language model
- Results in a condensed summary capturing key information

5. Node Creation
- Clustered chunks + corresponding summary = new tree node

6. Recursive Processing
- Repeat steps 2-5: Re-embed summaries, cluster nodes, generate higher level summaries
- Forming a multi-layer tree from the bottom up
- Until clustering is infeasible (final root node summarizes the entire corpus)

7. Retrieval
- Two methods: tree traversal (top-down layer by layer) or collapsed tree (flattened view)
- For each, compute cosine similarity between query and nodes to find most relevant

So in summary, RAPTOR leverages recursive clustering and summarization of text chunks to create a hierarchical tree structure for more effective contextual retrieval.Image
Jan 30 8 tweets 6 min read
1/n Exploiting Large Language Models (LLMs), RAG and KGs for Creative Design

A recent paper makes a compelling case for the tremendous yet untapped potential of large language models (LLMs) to transform materials science research. However, the authors thoughtfully acknowledge critical "pain points" in relying solely on the raw capabilities of LLMs in this complex domain. Accuracy, nuance, interpretability, reasoning - on all these fronts, LLMs fall short without a guiding hand.

That's exactly why this paper shines. It outlines strategies to partner with LLMs to elicit their strengths while overcoming weaknesses. Retrieval augmentation (RAG) provides lacks context to ground responses. Knowledge graphs (KGs) organize concepts ontologically to lend structure and meaning. Non-linear prompting channels creativity through critical filters. Diverse model collectives enable cooperative discovery.

What emerges is a vision for a new paradigm - LLMs not as opaque oracles, but as flexible components in an intelligible, distributed materials discovery infrastructure. One where human researchers set the objectives, models rapidly compound knowledge through code and data, and reciprocal feedback loops drive exploration.

This paper thus makes a timely case. That to fully actualize the manifest benefits of AI in advancing materials science, we must raise these powerful models to collaborators in a hybrid intelligence system built on transparency, trust, and shared creativity fueled by human curiosity.Image 2/n Main strategies covered:

1) Retrieval-augmented generation (RAG) methods to inject additional knowledge into the generative process to improve accuracy. RAG is highlighted as a powerful approach, especially when combined with graph-based methods.

2) Ontological knowledge graphs to provide interpretable structure that captures concepts and relationships. This facilitates mechanistic insights and more detailed responses from the LLM.

3) Nonlinear sampling techniques like tree-of-thought prompting to iteratively refine and improve responses, overcoming limitations of single-shot linear sampling.

4) Multi-agent models where specialized LLMs collaborate and interact autonomously to solve complex multimodal problems. Illustrates promise for advanced applications like automated force-field development.
Jan 23 11 tweets 2 min read
1/n The most important civic duty that a nation can instill in its citizens is the importance of life-long learning. This goes beyond access to education for our children. It involves a culture that leans toward healthy collaboration and drives toward sustained innovation. 2/n It is no surprise that so many citizens feel left out in today's system. People have never learned the skills to learn independently. But AI radically remedies this deficit! GPT-like systems are tireless teachers who can adapt their conversations to a student's cognitive biases and limitations.
Jan 19 7 tweets 5 min read
1/n Let's talk about Flow Enginneering that's discussed in the AlphaCodium paper:

The paper introduces the concept of "flow engineering" to characterize their proposed approach of AlphaCodium, and contrasts it with typical "prompt engineering" methods. The use of the term "flow engineering" can be justified in the following ways:

1. Multi-stage iterative process: AlphaCodium involves a structured, test-driven flow with progressive stages - problem analysis, test generation, initial coding, and iterative run-fix cycles. This goes beyond crafting an optimal prompt.

2. Incorporating code execution: The flow deeply integrates execution of the generated code against input-output examples into the modeling process, rather than purely focusing on static prompt tuning. This dynamic run-fix iteration on increasing tests sets it apart.

3. Scaffolding code development: The multi-step methodology provides a scaffolding that mirrors the software development process by incrementally going from specifications to code, resembling test-driven cycles.

4. Code-centric techniques: Several techniques tailor-made for code tasks supplement the basic flow - modular code prompting, test anchors prevent code divergence, output validation using test suites.

5. Knowledge accumulation: Each stage in the AlphaCodium flow builds up artifacts, learnings and validated components which are accumulated to aid downstream steps - a departure from one-off prompt engineering.

In summary, the use of the term "flow engineering" underscores the process-centric, execution-backed, and code-aware nature of the methodology going beyond static prompt design. It better captures the iterative, test-driven, development-mimetic essence.Image 2/n This paper is entirely fascinating in that it introduces an entirely novel way of viewing subsequent reasoning processes that influence both long chains of inference as well as subsequent retraining.

The paper proposes several code-oriented design concepts and best practices:

1. YAML Structured Output:
- Ask the model to generate output in YAML format conforming to a given Pydantic class definition.
- Eliminates need for complex prompt engineering, allows complex structured answers.
- More suitable than JSON for code due to handling of quotes, special chars etc.

2. Semantic Reasoning via Bullet Points:
- When asking the model to reason about a problem, use bullet point format.
- Forces splitting into logical sections, improves understanding.

3. Modular Code Generation:
- Ask the model to divide code into small sub-functions with meaningful names.
- Results in better code quality, easier iterative fixing.

4. Soft Decisions with Double Validation:
- Avoid strict decisions by the model which lead to hallucinations.
- Double validate potentially erroneous outputs.

5. Postponing Decisions and Exploration:
- Gradually move from easier to harder tasks, avoiding irreversible decisions early on.
- Leave room for exploring multiple possible solutions.

6. Test Anchors:
- Fix codes incorrectly when iterating on potentially invalid AI-generated tests.
- Use already passed tests as anchors to detect erroneous fixes.

It incorporates many of the best practices of agile software development in a machine learning optimization process!