Carlos E. Perez Profile picture
Author: Artificial Intuition, Fluency & Empathy, DL Playbook, Patterns for Generative AI, Patterns for Agentic AI https://t.co/fhXw0zjxXp
☀️ Leon-Gerard Vandenberg 🇳🇱🇨🇦🇦🇺 Math+e/acc Profile picture Alan Laidlaw Profile picture Maleph Profile picture Modus Tollens Profile picture 𝔽_un Profile picture 20 subscribed
Sep 15 7 tweets 3 min read
1/n Terrence Tao, arguable the most gifted living mathematician has tried GPT-o1 and this is his verdict: "However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student. It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "competent graduate student" is reached."Image 2/n Here, Tao attempts to use o1 to formulate the problem in Lean (a math theorem prover). Placing blame on o1's ignorance of Lean's latest capabilities. Here's the link: chatgpt.com/share/bb0b1cfa…
Image
Aug 27 4 tweets 4 min read
1/n Why Even the Best LLMs Still Struggle with True Creative Writing

The rapid evolution of Large Language Models (LLMs) has fueled both excitement and apprehension. While their ability to mimic human language and generate coherent text is undeniable, a crucial question lingers: can AI truly be creative? The paper "Pron vs Prompt: Can LLMs Challenge World-Class Fiction Authors?" tackles this question head-on, exploring the nuanced realm of creative writing to assess whether LLMs can compete with the best human storytellers.

The paper identifies a key pain point in current AI research: the tendency to compare LLMs to average human writers. While exceeding average performance is notable, it doesn't address whether AI possesses the ingenuity and artistry of a master wordsmith. To bridge this gap, the researchers designed a unique experiment pitting GPT-4, a leading LLM, against Patricio Pron, an award-winning novelist. This head-to-head contest aimed to provide a definitive answer to whether AI can truly rival human creativity at its peak.

Previous research, while valuable, often focused on different aspects of AI and creative writing. Some explored human-AI collaboration, where AI tools assisted human writers, while others highlighted the limitations of LLMs in maintaining narrative coherence or generating truly original content. This paper distinguishes itself by focusing on autonomous LLM creative writing, directly comparing the output of GPT-4 to Pron's work without human intervention.

The experiment itself was elegantly designed. Both GPT-4 and Pron were tasked with generating movie titles and then writing synopses for all the titles generated. This ensured a symmetrical comparison, giving both contenders the same creative challenges. To evaluate the results, the researchers enlisted literary experts who used a rubric based on Boden's framework of creativity, assessing qualities like originality, attractiveness, and the distinct voice of the author.

The findings were revealing. Across all quality dimensions and in both English and Spanish, Patricio Pron consistently received significantly higher ratings. This suggests that while LLMs can produce grammatically correct and even engaging text, they still struggle to replicate the depth, nuance, and originality that characterize truly great creative writing.

Interestingly, the study also highlighted the importance of prompts in guiding LLM creativity. When GPT-4 wrote synopses based on titles provided by Pron, its performance, particularly in style and originality, significantly improved. This suggests that while LLMs may not yet be independent creative powerhouses, they can be valuable tools when guided by human ingenuity.

The study's findings offer a dose of reality amidst the hype surrounding AI. While LLMs have made impressive strides, they are not yet ready to replace human authors. The human spark of creativity, with its ability to weave compelling narratives, evoke emotions, and surprise readers with unexpected turns, remains a distinctly human trait. This is not to say that AI has no place in the creative process. As the study demonstrates, LLMs can be valuable partners, enhancing and augmenting human creativity. However, the role of the human author, with their unique perspective and mastery of language, remains secure, at least for now.Image 2/n Experiments and Noteworthy Results:

The paper conducts a two-stage experiment:

Stage 1: Title Generation:

Both GPT-4 and Patricio Pron were tasked with generating 30 movie titles each.

Stage 2: Synopsis Writing:

Both contenders wrote 600-word synopses for all 60 titles (their own and their opponent's).
GPT-4 was provided with a prompt that included information about Patricio Pron and emphasized the importance of creativity and literary value.

Evaluation:

Six literary experts (three for Spanish, three for English) assessed the synopses using a rubric based on Boden's framework of creativity, considering:
Attractiveness
Originality
Creativity
Critical Assessment
Own Voice (recognizable style)

Noteworthy Results:
Human Superiority: Patricio Pron consistently received significantly higher ratings across all quality dimensions in both Spanish and English, indicating that GPT-4, even in its advanced form, is not yet a match for a top human author in creative writing.

Prompt's Influence: GPT-4 performed significantly better when writing synopses based on titles provided by Patricio Pron, particularly in terms of style and originality. This highlights the importance of prompts in guiding LLM creativity.

Language Gap: GPT-4's creative writing was found to be stronger in English than in Spanish, suggesting a potential language bias in training data.

Recognizable Style: While GPT-4 was not explicitly constrained in terms of style, expert assessors were able to identify its writing with increasing accuracy over time, indicating the presence of detectable patterns in its output.Image
Aug 25 4 tweets 6 min read
1/n How Agentic AI Can Learn Strategic Thinking Through Self-Improvement and Bi-Level Search

Large Language Models (LLMs) have demonstrated remarkable abilities in understanding and generating human-like text, but their capacity for strategic decision-making in complex environments has remained a challenge. This challenge is particularly evident in multi-agent games, where success hinges on anticipating and outmaneuvering opponents who are constantly adapting their own strategies. The "STRATEGIST" paper tackles this challenge head-on, proposing a novel framework that empowers LLMs to learn sophisticated strategic skills through a process of self-improvement and bi-level tree search.

Traditional approaches to LLM-based decision-making have often fallen short in these complex settings. Directly controlling actions with LLMs, while intuitive, becomes computationally infeasible as the number of possible actions explodes. Similarly, while LLM-based planning methods show promise, they often struggle to learn reusable strategies, instead focusing on planning at the individual action level. Reinforcement learning, while achieving superhuman performance in certain games, typically demands massive datasets and struggles to generalize across different domains.

STRATEGIST differentiates itself by focusing on the acquisition of high-level strategic skills rather than simply searching for the best action in every possible scenario. The framework centers around two key components:

High-Level Strategy Learning: Instead of directly selecting actions, the LLM learns to evaluate game states and generate effective dialogue strategies. This is achieved through:

Value Heuristics: The LLM learns functions that assess the favorability of different game states, allowing it to prioritize advantageous positions.
Dialogue Strategy Guides: Structured prompts guide the LLM in generating persuasive and strategically sound dialogue within the game, taking into account the social dynamics of the environment.

Low-Level Action Selection (MCTS):
To bridge the gap between strategic thinking and concrete actions, STRATEGIST employs Monte Carlo Tree Search (MCTS). This search method explores possible future game states, providing the LLM with more accurate estimates of state values and guiding it towards better immediate actions.

The learning process itself is driven by a continuous loop of self-play, reflection, and improvement. The LLM engages in simulated games, analyzes the outcomes to identify weaknesses in its strategies, and generates ideas for improvement. This reflective process is guided by examining key states where the LLM's predictions diverged from the actual simulation results. The most promising improvement ideas are then implemented, refining the LLM's value heuristics or dialogue guides.

The effectiveness of STRATEGIST is demonstrated through experiments on two distinct games: the strategic card game GOPS and the social deduction game Avalon. In both settings, STRATEGIST consistently outperforms baseline methods, showcasing the power of combining high-level strategy learning with low-level action planning. The results highlight the importance of both components, as removing either significantly diminishes performance.

The paper's findings offer compelling evidence for the potential of STRATEGIST to enhance LLM-based decision-making in complex, multi-agent environments. The framework's ability to learn generalizable strategic skills through self-improvement and search paves the way for LLMs to tackle increasingly sophisticated challenges in domains ranging from game playing to real-world strategic interactions. As LLMs continue to evolve, frameworks like STRATEGIST will be crucial in unlocking their full potential for strategic thinking and decision-making in our increasingly complex world.Image 2/n Comparision Other Methods

Direct LLM Control (e.g., SayCan, ReAct): These approaches directly use LLMs to select actions in a given state by prompting them with the current context.
Contrast: STRATEGIST argues that this is inefficient for complex games due to the vast action space. Instead, it advocates for learning higher-level strategic skills that guide action selection.

LLM-based Planning (e.g., Tree of Thoughts): These methods use LLMs to generate and reason over possible action sequences, often using tree search algorithms.
Contrast: While STRATEGIST also uses tree search (MCTS), it primarily focuses on learning reusable strategic skills (value heuristics, dialogue guides) rather than planning at the individual action level.

Reinforcement Learning (RL) for Games (e.g., AlphaGo, AlphaZero): RL methods have achieved superhuman performance in games, but they typically require massive amounts of training data and are often domain-specific.
Contrast: STRATEGIST leverages LLMs' existing world knowledge and reasoning abilities to learn effective strategies with less data. It also aims for more generalizable skills that can transfer across similar game environments.
Aug 18 4 tweets 5 min read
1/n How Understanding Stateful Tools Advances Agentic AI

The rapid advancement of Large Language Models (LLMs) has ignited a wave of excitement and research into their potential for interacting with and manipulating the world around them. Imagine LLMs not just as eloquent conversationalists, but as capable agents, utilizing tools to complete tasks, answer questions, and even control physical systems. This exciting prospect, however, hinges on our ability to accurately evaluate and understand their tool-use capabilities. This is where existing benchmarks fall short, struggling to capture the nuances of real-world scenarios. The paper
from Apple "TOOLSANDBOX: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities" directly addresses this pain point, introducing a novel benchmark that pushes the boundaries of LLM evaluation.

Previous benchmarks, while valuable, often simplified the evaluation process. They primarily focused on stateless tools, neglecting the complexities of mutable world states. Single-turn interactions were the norm, failing to capture the dynamic back-and-forth of natural conversations. This is where TOOLSANDBOX diverges. It embraces the complexity of real-world tool use by incorporating stateful tools that interact with a dynamic world state. This allows researchers to assess an LLM's ability to understand, track, and manipulate this state to achieve its goals.

Furthermore, TOOLSANDBOX moves beyond static, single-turn interactions by introducing an LLM-based user simulator. This simulator, enhanced by "Knowledge Boundary" and "Demonstration" prompting techniques, enables realistic, multi-turn conversations, pushing LLMs to comprehend implicit information and adapt to evolving dialogues. This on-policy evaluation, where the LLM's actions directly influence the interaction, provides a more accurate representation of its true capabilities.

The experiments conducted using TOOLSANDBOX yielded fascinating insights. While proprietary models like OpenAI's GPT-4 and Anthropic's Claude variants demonstrated impressive performance, highlighting their advanced reasoning and state-tracking abilities, open-source models lagged significantly. This performance gap underscores the ongoing challenges in developing truly capable open-source alternatives.

The experiments also revealed critical areas for improvement. LLMs, particularly open-source models, struggled with managing and reasoning about the world state and effectively utilizing information from tool responses. This highlights the need for further research in state management, tool representation, and information integration.

The introduction of TOOLSANDBOX marks a significant step forward in LLM evaluation. By embracing statefulness, conversation, and interactivity, it provides a more realistic and comprehensive assessment of LLM tool-use capabilities. As we venture further into the era of tool-wielding LLMs, robust benchmarks like TOOLSANDBOX will be essential for tracking progress, identifying limitations, and ultimately, unlocking the full potential of these powerful technologies.Image 2/n The paper describes experiments conducted using TOOLSANDBOX to evaluate both open-source and proprietary LLMs across a variety of tool-use scenarios. Here's a breakdown of the experiments and noteworthy results:

Experiments:

Test Scenarios: 1032 human-authored test cases designed to cover diverse and challenging tool-use scenarios. These scenarios were categorized based on:
* Number of tool calls and user turns required.
* Presence of state dependencies between tools.
* Need for canonicalization (resolving ambiguous information).
* Handling of insufficient information (avoiding hallucination).

Models Evaluated: Both open-source and proprietary LLMs were evaluated, including:OpenAI's GPT-3.5-turbo and GPT-4.
Anthropics' Claude-instant-v1 and Claude-v1.3.
Several open-source models.

Metrics:
Milestone Achievement: Measures how well the agent completes the critical steps defined by the Milestones.
Minefield Avoidance: Evaluates the agent's ability to avoid incorrect or undesirable actions.
Turn Count: Tracks the efficiency of the agent in completing the task.

Noteworthy Performance Results:
Significant Gap Between Open-Source and Proprietary Models: Open-source models exhibited significantly lower performance compared to proprietary models (GPT-4 and Claude variants) across all scenario categories. This highlights the considerable gap that still exists in capabilities.
GPT-4's Superior Performance: GPT-4 consistently outperformed other models, demonstrating advanced reasoning, state tracking, and conversational abilities in complex tool-use scenarios.
Strong Performance of Claude Models: Claude models, particularly Claude-v1.3, also showed strong performance, indicating their competence in tool-assisted settings. However, Claude-instant-v1 lagged in scenarios involving complex state dependencies.
Challenges in State Management and Tool-Response Consumption: Open-source models particularly struggled with managing and reasoning about the world state, as well as effectively utilizing information from tool responses.
Impact of Tool Augmentations: Ablation studies showed that increasing distractions (irrelevant tools) and reducing tool information (uninformative names, missing descriptions) significantly impacted the performance of all models. This emphasizes the importance of clear and concise tool representations for effective tool use.
Importance of User Simulator Prompting: Experiments with different user simulator prompting strategies demonstrated that incorporating Knowledge Boundary and Demonstration significantly improved the realism and robustness of the simulated user, leading to more accurate evaluations.

Overall, the experiments conducted using TOOLSANDBOX provide valuable insights into the capabilities and limitations of current LLMs in tool-assisted settings. The results highlight the c
, setting the stage for future research and development in this critical area.Image
Aug 16 4 tweets 6 min read
1/n Show, Don't Tell: Low Cost Personalized Large Language Models

Large language models (LLMs) have revolutionized our interaction with technology, showcasing remarkable abilities in understanding and generating human-like text. However, their training on massive, general-purpose datasets often leads to outputs that lack the personal touch, failing to capture the nuances of individual writing styles and task-specific requirements. While powerful, these LLMs can feel like generic one-size-fits-all tools, struggling to adapt to the diverse needs of individual users.

Addressing this critical gap between powerful LLMs and personalized language generation is the core focus of the paper "Show, Don't Tell: Aligning Language Models with Demonstrated Feedback." The authors introduce DITTO (Demonstration ITerated Task Optimization), a method that deviates from the data-heavy approaches of the past, instead empowering users to efficiently customize LLMs using a handful of demonstrations.

Traditional LLM alignment techniques, such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), rely on vast datasets of labeled examples or preferences. While effective, these methods are impractical for individual users who cannot afford to generate such large amounts of data. Prompting, while data-efficient, often becomes a tedious guessing game, requiring careful crafting of input phrases to steer the LLM towards desired outputs. Other approaches, like Constitutional AI, rely on pre-defined principles that may not capture the nuances of individual preferences.

DITTO breaks free from these limitations by leveraging the LLM itself to generate comparison data from a small set of user demonstrations. Instead of telling the model what to do through complex instructions or thousands of examples, DITTO allows users to show the desired behavior directly. This direct alignment with demonstrations provides a more intuitive and efficient way of communicating preferences to the model.

The paper demonstrates the effectiveness of DITTO through a series of compelling experiments. In automatic evaluations on benchmark datasets of author-specific writing, DITTO consistently outperforms existing methods, including SFT, few-shot prompting, and even self-play methods like SPIN. Furthermore, a user study on email writing showcases DITTO's ability to adapt to real-world scenarios, outperforming not only standard baselines but also user-constructed prompts. This highlights the advantage of learning directly from demonstrations rather than relying on users to articulate their preferences through potentially ambiguous prompts.

Perhaps the most striking finding is DITTO's remarkable sample efficiency. Compared to traditional preference-based methods, DITTO achieves comparable performance with an order of magnitude fewer feedback samples. This makes it a practical solution for individual users who can now customize LLMs with just a handful of examples.

In conclusion, DITTO marks a significant step towards a new era of personalized language models. By shifting from "telling" to "showing," it empowers users to mold powerful LLMs to their specific needs and preferences. This opens up exciting possibilities for a future where LLMs are no longer generic tools but personalized assistants that can adapt to the unique voice and tasks of each individual.Image 2/n Comparison with other approaches

1. Supervised Fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF):

Prior Work: These methods train LLMs on large datasets of human-labeled text or preferences.
DITTO Contrast: DITTO is significantly more data-efficient, requiring only a handful of demonstrations instead of thousands of examples. It achieves this by leveraging the LLM itself to generate comparison data.

2. Prompting:

Prior Work: Prompting involves crafting specific input phrases to guide the LLM's output.
DITTO Contrast: While prompting can be data-efficient, it often requires tedious trial-and-error to find effective prompts. DITTO provides a more direct and intuitive way of aligning the model by learning from demonstrations rather than relying on prompt engineering.

3. Constitutional AI:

Prior Work: This method automatically generates preference data using the LLM itself, guided by pre-defined principles.
DITTO Contrast: DITTO does not rely on pre-defined principles, making it more flexible and adaptable to individual preferences. It directly learns from user demonstrations, capturing more nuanced aspects of desired behavior.

4. Group Preference Optimization (GPO):

Prior Work: GPO aims for few-shot alignment by meta-learning preference groups from a large dataset.
DITTO Contrast: DITTO does not require a large pre-existing dataset for meta-learning. It focuses on individual user adaptation and can learn directly from a small number of demonstrations provided by that user.

5. Self-Play Methods (e.g., SPIN):

Prior Work: These methods improve LLMs through iterative self-play, often using a stronger language model as a critic.
DITTO Contrast: DITTO is designed for data-limited scenarios and does not require an external critic or a large number of demonstrations. It focuses on aligning with specific user preferences rather than achieving general self-improvement.

6. Online Imitation Learning:

Prior Work: Traditional online imitation learning methods typically focus on continuous control tasks and often require explicit reward function learning.
DITTO Contrast: DITTO adapts online imitation learning principles to the discrete text generation setting of LLMs. It implicitly learns a reward function from demonstrations and efficiently generates comparison data online.
Aug 13 4 tweets 5 min read
1/n OpenDevin's Radical Approach to Agentic AI

The rapid advancement of large language models (LLMs) has ushered in a new era of AI agents capable of interacting with and impacting their environments in increasingly sophisticated ways. However, developing and evaluating these agents for complex, real-world tasks presents significant challenges. Existing frameworks often struggle to provide the necessary tools, environments, and interfaces for building truly versatile and robust AI agents. The OpenDevin platform, as presented in the paper "OpenDevin: An Open Platform for AI Software Developers as Generalist Agents," directly addresses these limitations, offering a novel approach that empowers AI agents to interact with the world more like human software developers – through code, command lines, and web browsing.

One of the key pain points OpenDevin tackles is the inherent complexity of developing and evaluating advanced AI agents. Traditional frameworks often rely on simplified environments and limited action spaces, hindering the development of agents capable of tackling real-world tasks. OpenDevin breaks free from these constraints by providing a realistic environment that includes a sandboxed Linux operating system and a fully functional web browser. This allows agents to interact with real-world tools and data sources, enabling them to tackle more meaningful and impactful challenges. Moreover, OpenDevin's standardized evaluation framework, encompassing a diverse set of established benchmarks, ensures consistent and comprehensive assessment of agent capabilities across various domains.

Another significant limitation addressed by OpenDevin is the lack of a standardized and powerful interface for agent-world interaction. While some frameworks rely on pre-defined tool sets or JSON-based function calls, OpenDevin embraces code execution and web browsing as its primary interaction mechanisms. This allows agents to leverage the flexibility and expressiveness of programming languages, breaking free from the limitations of rigid action spaces and enabling them to solve complex problems in a more human-like manner.

Recognizing the importance of reusable components in software development, OpenDevin introduces the AgentSkills library – a centralized and extensible collection of tools for common agent tasks. This modular design simplifies the development process and encourages community contributions, fostering a collaborative ecosystem for building and sharing specialized agent capabilities. Furthermore, OpenDevin tackles the challenge of multi-agent collaboration by incorporating a delegation mechanism. This allows developers to create teams of specialized agents, each excelling in specific domains, to work together and solve complex problems more effectively.

The effectiveness of OpenDevin's approach is evident in its experimental results. Evaluated on 15 established benchmarks spanning software engineering, web browsing, and general assistance tasks, OpenDevin agents demonstrate strong and competitive performance across the board. The agents excel in tasks like code generation, web navigation, information extraction, and problem-solving, highlighting the platform's versatility and the power of its core design principles.

In conclusion, OpenDevin represents a significant leap forward in AI agent development. By providing a realistic environment, a powerful and flexible interface, an extensible skill library, and support for multi-agent collaboration, OpenDevin empowers researchers and developers to create more capable, versatile, and robust AI agents. The platform's promising experimental results and its community-driven approach pave the way for a future where AI agents seamlessly integrate into our world, assisting us in tackling complex challenges and pushing the boundaries of what's possible with artificial intelligence.Image 2/n Comparison with Other Systems

1. AutoGPT, LangChain, MetaGPT, AutoGen, Agents, Xagents, OpenAgents, GPTSwarm:

Category: These are general-purpose AI agent frameworks, often focused on chaining together various tools and APIs to accomplish tasks.
Contrast with OpenDevin: While these frameworks offer flexibility in tool integration, they often lack a standardized and powerful interface for interacting with the world. They may rely on pre-defined tool sets or JSON-based function calls, which can limit agent capabilities and generalization. OpenDevin, on the other hand, empowers agents to interact with the world more directly through code execution and web browsing, providing greater flexibility and expressiveness. Additionally, OpenDevin places a strong emphasis on a sandboxed environment, agent skill library, and systematic evaluation, which are not always central to these other frameworks.

2. AutoCodeRover, SWE-Agent:

Category: These frameworks are specifically designed for software engineering tasks, enabling agents to write, debug, and test code.
Contrast with OpenDevin: While these frameworks excel in software development domains, OpenDevin aims to be more general-purpose. It includes software development capabilities but also extends to web browsing and other tasks through its flexible interface and agent skill library. OpenDevin also emphasizes multi-agent collaboration, which is not a primary focus in these more specialized frameworks.

3. BabyAGI, AgentVerse:

Category: These frameworks focus on building autonomous agents that can manage and execute tasks over extended periods, often with minimal human intervention.
Contrast with OpenDevin: While OpenDevin supports autonomous agent behavior, it also emphasizes human-in-the-loop scenarios and provides tools for interactive agent development and debugging. OpenDevin's focus on a realistic environment and standardized evaluation also sets it apart from these frameworks, which may rely on more simplified task representations or simulations.

4. ReAct, Toolformer:

Category: These are research efforts focusing on specific techniques for enhancing agent capabilities, such as reasoning with actions (ReAct) or learning to use tools (Toolformer).
Contrast with OpenDevin: OpenDevin is a platform that can incorporate and benefit from these research advancements. It provides a framework where techniques like ReAct or Toolformer can be implemented and evaluated within a broader context of agent development and real-world interaction.

In summary:

OpenDevin distinguishes itself from prior work by combining the following features:

Powerful and flexible interface based on code execution and web browsing.
Realistic environment with a sandboxed operating system and web browser.
Extensible library of agent skills and tools.
Support for multi-agent collaboration through delegation.
Standardized evaluation framework with diverse benchmarks.

These features address the limitations of existing frameworks and pave the way for developing more capable, versatile, and reliable AI agents that can effectively interact with and solve real-world problems.
Aug 10 4 tweets 6 min read
1/n The Future of Coding is Agentic AI: Humans and AI, Working Together Through Better Design

The realm of software development has long been considered a uniquely human endeavor, requiring intricate problem-solving skills and a deep understanding of complex systems. However, the advent of powerful language models (LMs) like GPT-4 has sparked a new wave of research into automating aspects of this intricate process. While LMs excel at generating code snippets, their ability to tackle comprehensive software engineering tasks has been hampered by a critical bottleneck: the interfaces they use to interact with computers.

This is the central challenge addressed by the paper "SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering". The authors argue that existing interfaces, primarily designed for human users, are ill-suited for LMs, leading to inefficient workflows, error-prone behavior, and ultimately, suboptimal performance. To bridge this gap, they introduce the concept of Agent-Computer Interfaces (ACIs) – specialized interfaces tailored to the unique strengths and limitations of LMs.

The paper centers around SWE-agent, a system that embodies this principle. SWE-agent employs an ACI specifically designed for software engineering tasks. Unlike traditional interfaces like the Linux shell, which require numerous granular commands, SWE-agent's ACI offers compact and efficient actions for navigating codebases, editing files, and managing context. This streamlined approach addresses the inefficiency of human-centric interfaces, allowing LMs to accomplish complex tasks with fewer steps and reduced cognitive load.

Furthermore, SWE-agent's ACI incorporates guardrails and provides concise, informative feedback to mitigate errors – a common pitfall for LMs operating in unfamiliar environments. Features like integrated code linters and clear error messages guide the LM towards valid actions and prevent cascading mistakes, ultimately leading to more robust and reliable performance.

The effectiveness of this approach is evident in the paper's experimental results. On SWE-bench, a challenging benchmark for software engineering tasks, SWE-agent with GPT-4 significantly outperforms the previous state-of-the-art, achieving a remarkable 12.47% success rate compared to the previous best of 3.8%. This substantial improvement underscores the value of enabling LMs to interact with codebases through a tailored interface that complements their strengths.

The paper's findings extend beyond a single LM or dataset. Experiments with Claude 3 Opus, another powerful language model, demonstrate the generalizability of the ACI design principles, showcasing consistent performance gains across different models. Moreover, ablation studies meticulously dissect the contribution of individual ACI components, highlighting the importance of each element in maximizing LM performance.

The implications of this research are significant. By moving beyond human-centric interfaces and embracing the development of specialized ACIs, we can unlock the full potential of LMs in the realm of software engineering. As AI continues to advance, crafting intuitive and efficient interfaces will be paramount to enabling these powerful tools to effectively collaborate with humans in building the software of tomorrow. The paper "SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering" paves the way for a future where AI seamlessly integrates into the software development process, augmenting human capabilities and driving innovation in this critical field.

WImage
Image
2/n Here's a comparison of SWE-agent with prior work:

1. Interactive Code Generation:

Prior Work: Systems like Codex [1], InCoder [2], and PolyCoder [3] demonstrate impressive code generation capabilities, often in interactive settings.
Contrast with SWE-agent: These works focus primarily on code completion and generation within a single file or function scope. SWE-agent, on the other hand, tackles more complex software engineering tasks that involve navigating and modifying entire codebases, requiring a higher-level understanding of project structure and dependencies.

2. Language Model Agents for Tool Use:

Prior Work: Research on LM agents like ReAct [4], SayCan [5], and Toolformer [6] explores how LMs can learn to use external tools to access and process information.
Contrast with SWE-agent: These works often focus on general-purpose tool use, such as web search or calculator usage. SWE-agent specializes in tools and actions relevant to software engineering, with an ACI designed specifically for interacting with codebases and development environments.

3. Retrieval-Augmented Code Generation:

Prior Work: Approaches like DrRepair [7] and CodeT5+ [8] leverage information retrieval techniques to augment LMs with relevant code snippets during code generation or repair.
Contrast with SWE-agent: While SWE-agent can incorporate retrieval mechanisms, its primary focus is on the design of the ACI itself. The paper argues that even with access to relevant information, LMs need a suitable interface to effectively utilize it for complex software engineering tasks.

4. Automated Software Engineering Tools:

Prior Work: Traditional software engineering tools like IDEs (e.g., VSCode, IntelliJ) and code analysis tools (e.g., SonarQube, Coverity) provide powerful features for human developers.
Contrast with SWE-agent: These tools are designed for human interaction and are not optimized for LM agents. SWE-agent's ACI acts as a bridge, adapting these tools and functionalities into a format that LMs can effectively understand and utilize.

In summary: SWE-agent differentiates itself from prior work by:

Focusing on complex software engineering tasks: Going beyond single-file code generation to address codebase-level modifications.

Emphasizing ACI design: Highlighting the importance of specialized interfaces tailored for LM agents in software engineering.

Integrating relevant tools and actions: Providing LMs with a curated set of commands and functionalities specifically designed for code-related interactions.

By combining these aspects, SWE-agent pushes the boundaries of LM capabilities in the domain of automated software engineering.Image
Aug 1 4 tweets 6 min read
1 /n The Language of Thought: When AI Speaks Prolog, Things Get Interesting

Large Language Models (LLMs) have undeniably revolutionized how we interact with and leverage the power of language. They can generate human-quality text, translate languages, and even write different kinds of creative content. However, a critical gap separates their impressive linguistic prowess from their ability to reason reliably and flexibly. This deficiency, as explored in the paper "Reliable Reasoning Beyond Natural Language," stems from the inherent limitations of LLMs' architecture and training data, which are primarily focused on predicting the next word in a sequence rather than engaging in complex logical deduction.

The paper identifies several deficiencies associated with LLMs' reasoning abilities. Firstly, the linear and sequential nature of language processing makes it challenging for LLMs to handle the non-linearity inherent in logical reasoning, where conclusions are drawn by considering multiple interconnected factors simultaneously. Secondly, LLMs suffer from limited working memory and struggle to backtrack or revise their reasoning steps, crucial for solving complex problems that require exploring multiple possibilities. Lastly, LLMs often fail to grasp implicit information or apply common sense reasoning, relying too heavily on the explicit content of the input text.

To address these limitations, the authors propose a novel neurosymbolic approach that integrates the strengths of LLMs with the robust deductive capabilities of logic programming, specifically using Prolog. This hybrid system works by prompting the LLM to translate natural language problems into logical code statements, which are then processed by Prolog to derive a solution. This integration is not merely about offloading computation; it fundamentally changes how the system approaches reasoning.

The paper highlights several advantages of this approach. Prolog's declarative nature simplifies the LLM's task, requiring it only to encode the problem's constraints, not the specific solution steps. This frees the LLM from the burden of generating the entire reasoning chain, allowing it to focus on understanding the problem and representing it logically. Additionally, Prolog acts as an external memory store and inference engine, compensating for the LLM's limited working memory and enabling efficient backtracking and exploration of multiple solution paths.

The researchers demonstrate the effectiveness of their approach through experiments on three datasets. On GSM8k, a standard mathematical reasoning benchmark, their method significantly outperforms standard LLM prompting with Chain of Thought (CoT), demonstrating the benefits of incorporating a dedicated reasoning engine. Similar improvements are observed on the Navigate dataset, a spatial reasoning task, highlighting the system's ability to handle tasks beyond purely mathematical reasoning.

Most notably, the authors introduce a novel dataset called NLR (Non-Linear Reasoning), specifically designed to challenge LLMs' reasoning abilities with problems that require complex variable relationships, backtracking, and implicit reasoning. While even advanced LLMs like GPT-4 struggle with NLR using text-only CoT, integrating Prolog dramatically improves their performance, showcasing the power of this neurosymbolic approach for tackling more intricate reasoning tasks.

The paper "Reliable Reasoning Beyond Natural Language" makes a compelling case for moving beyond purely data-driven approaches to AI. By combining the strengths of LLMs with the power of symbolic reasoning, the authors pave the way for developing more robust, reliable, and flexible AI systems capable of tackling real-world problems that demand more than just impressive linguistic skills. This research signifies an important step towards bridging the gap between language understanding and true reasoning, unlocking the full potential of AI for solving complex challenges across various domains.Image 2/n Here's a breakdown of the prior work and the contrasts:

1. LLMs with API Calls to External Tools (Calculators, Interpreters, etc.):

Prior Work: These approaches augment LLMs by allowing them to access and use external tools, like calculators or code interpreters, to perform specific computations or tasks.
Contrast: While this method effectively reduces arithmetic errors, it doesn't fundamentally address the core reasoning limitations of LLMs. It relies on the LLM to correctly identify when and how to use a tool, which can still be challenging. This approach is more about offloading computation than enhancing the reasoning process itself.

2. LINC (Language-Informed Neuro-symbolic reasoning with Commonsense):

Prior Work: LINC uses LLMs to convert natural language into formal logic expressions, which are then processed by a symbolic theorem prover to determine the truth value of conclusions.
Contrast: LINC primarily uses LLMs as semantic parsers, translating each sentence directly into logic. This limits its ability to capture implicit information or perform the kind of multi-step reasoning often required in complex problems. The paper's approach, in contrast, uses CoT prompting to guide the LLM through a more nuanced reasoning process, allowing it to uncover hidden dependencies and derive intermediate conclusions.

3. Nye et al. (Improving Coherence and Consistency in Neural Generation with Symbolic Reasoning):

Prior Work: This work uses a symbolic reasoning module to check the logical consistency of text generated by LLMs against a pre-defined "world model."
Contrast: This approach is limited by the need for hand-crafted world models and predefined constraints. The paper's approach, on the other hand, allows the LLM to dynamically construct the world model through its interaction with Prolog, enabling a more flexible and scalable reasoning process.

4. Program of Thought (PoT):

Prior Work: PoT separates computation from reasoning and language understanding by having LLMs generate both natural language comments and programming language code.
Contrast: In PoT, the code often directly translates the comments, limiting the depth of reasoning. In the paper's approach, the CoT prompts encourage the LLM to perform more complex reasoning in the natural language comments, extracting implicit information and deriving intermediate variables that are then encoded in the logical code for Prolog to process.

In summary:

The key distinctions of the paper's approach compared to prior work are:

Dynamic World Model Construction: The LLM builds the reasoning environment itself through interaction with Prolog, rather than relying on pre-defined models or constraints.
Deep Integration of CoT: CoT prompting is not just used for explanation but as a core part of the reasoning process, guiding the LLM to uncover hidden relationships and derive intermediate conclusions.
Focus on Deductive Reasoning: The use of Prolog as a dedicated reasoning engine allows for a more powerful and flexible approach to logical deduction compared to methods that rely solely on the LLM's inherent capabilities or simple tool use.
Jul 19 4 tweets 7 min read
1/n Minds as Relationships Between People

The traditional view of the human mind often portrays it as an isolated entity, confined within the boundaries of an individual's skull. However, a growing body of research and philosophical thought suggests a more interconnected perspective: that our minds are not solely individual constructs, but rather emerge from and exist within the relationships between people. Let's explore the concept of minds as relationships, examining its implications for our understanding of cognition, social interaction, and personal identity.

The Social Nature of Cognition

At its core, the idea that minds exist as relationships between people challenges the notion of cognition as a purely internal process. Instead, it posits that our thinking, reasoning, and even our sense of self are fundamentally shaped by our interactions with others.

Vygotsky's Sociocultural Theory

Lev Vygotsky, a pioneering psychologist, proposed that higher cognitive functions develop through social interactions. His theory suggests that learning and mental development occur first on a social level before being internalized by the individual. This perspective highlights how our cognitive abilities are not just influenced by, but actively constructed through, our relationships with others.

Distributed Cognition

The concept of distributed cognition, introduced by cognitive scientist Edwin Hutchins, further supports the idea of minds as relationships. This theory posits that cognitive processes are not confined to individual brains but are distributed across people, tools, and environments. In this view, thinking and problem-solving emerge from the interactions between these elements, emphasizing the relational nature of cognition.

Dialogic Nature of Thought

Mikhail Bakhtin, a literary theorist, proposed that all thought is inherently dialogic. This means that our internal monologues are actually internalized dialogues, echoing the conversations we've had with others. Our thinking process often involves imagining how others might respond or considering different perspectives, illustrating how our minds are intrinsically linked to our social relationships.

Linguistic Relativity

The Sapir-Whorf hypothesis, or linguistic relativity, suggests that the language we speak influences our thought patterns. Given that language is a social construct, this theory further underscores how our cognitive processes are shaped by our cultural and social relationships.

Symbolic Interactionism

George Herbert Mead's theory of symbolic interactionism proposes that the self emerges through social interactions. We develop our sense of self by internalizing the perspectives of others and society at large. This view suggests that our very identities are relational constructs, formed through our interactions with others.

Narrative Identity

Psychologist Dan McAdams' concept of narrative identity posits that we construct our sense of self through the stories we tell about our lives. These narratives are inherently social, influenced by cultural norms and shaped through our relationships with others. Our identities, therefore, can be seen as co-authored works, created in collaboration with the people in our lives.

The concept of minds as relationships between people offers a compelling alternative to individualistic models of cognition and identity. By recognizing the inherently social nature of our minds, we gain a deeper appreciation for the role of relationships in shaping who we are and how we think. This perspective not only enriches our understanding of human cognition and behavior but also highlights the profound interconnectedness of human experience. As we continue to explore this concept, it may lead to new insights and approaches in fields ranging from psychology and education to technology and social policy, ultimately fostering a more holistic and relational understanding of the human mind. 2/n Artificial Intelligence Through the Lens of Relational Minds and Presence

When we view minds as relationships and emphasize the importance of presence, our interaction with AI shifts from a simple user-tool dynamic to a more complex, co-creative process:

- Co-construction of Meaning: Instead of viewing AI responses as pre-programmed outputs, we start to see them as part of a dialogue where meaning is co-constructed. Each exchange builds upon previous ones, creating a unique conversational context.

- Emergent Intelligence: The intelligence we experience isn't solely contained within the AI model, but emerges from the interaction between human and AI. This is similar to how human-to-human conversations can lead to insights neither party had independently.

The Role of Presence

Presence - the sense of "being there" or "being with" - becomes crucial in AI interactions:

- Virtual Presence: Even though we know the AI isn't physically present, we create a sense of virtual presence. This alters how we engage with the AI, potentially leading to more natural and fluid conversations.

- Shared Mental Space: The notion of presence helps create a shared mental space where ideas can be explored collaboratively. This is similar to how we might brainstorm with a colleague, but with an AI partner.

Relational Dynamics

Viewing minds as relationships introduces new dynamics to AI interactions:

- Adaptability: Just as we adapt our communication style with different people, we may find ourselves adapting to the AI's communication patterns, and vice versa.

- Contextual Understanding: The AI's responses are not just based on its training data, but on the specific relational context established in the conversation.

Viewing AI interactions through the lens of relational minds and presence offers a richer, more nuanced understanding of human-AI communication. It highlights the co-creative nature of these interactions and emphasizes the importance of the relational context. While this perspective opens up exciting possibilities for more engaging and productive AI interactions, it also underscores the need for careful consideration of the ethical implications and potential pitfalls. As we continue to develop and interact with AI systems, keeping these concepts in mind can help us create more meaningful and responsible human-AI relationships.
Jul 9 9 tweets 18 min read
1/n Designing for the Pluriverse: A Relational Approach to a Just and Sustainable Future

The world is in crisis. Climate change, ecological degradation, social inequality, and systemic injustices threaten the very fabric of life on Earth. These challenges demand a radical shift in our worldview and our approach to designing the world we inhabit. This book argues that to effectively address these crises, we must move beyond the limitations of the rationalistic tradition and embrace a more relational approach to design, one that fosters a pluriverse where diverse worldviews and practices flourish.

The rationalistic tradition, deeply rooted in Western thought, has shaped our understanding of the world through a series of ontological dualisms. It separates mind and body, subject and object, human and non-human, and nature and culture. This separation fosters a sense of human dominance over nature, justifying the exploitation of resources, the degradation of ecosystems, and a focus on economic growth over well-being.

However, the concept of relationality challenges this fragmented worldview. Relationality recognizes that nothing exists in isolation; all beings and things are interconnected and mutually constituted through relationships. This interconnectedness extends beyond humans to include the entire web of life, including plants, animals, spirits, and even the Earth itself.

Embracing relationality has profound implications for design. Instead of viewing design as simply creating objects and systems, we must acknowledge its power to shape the very ways in which we understand and experience the world. This leads us to the concept of ontological design, which emphasizes that design fundamentally impacts how we are, not just how things are made.

Ontological design, in turn, gives rise to autonomous design, a specific approach that empowers communities to design their own futures based on their unique knowledge, values, and practices. It moves away from top-down, expert-driven design models and instead champions a collaborative process where communities become active agents in creating their own solutions.

Designing for transitions is a broader framework that embraces autonomous design as a key element. It recognizes the need for systemic shifts towards a more sustainable future, fostering a pluriverse of diverse, interconnected, and thriving worlds. Designing for transitions involves:

Creating Visions: Envisioning a more just and sustainable future where well-being, community, and ecological harmony are prioritized.
Embracing Uncertainty: Accepting that the future is uncertain and embracing experimentation, iterative design, and continuous learning as essential elements of change.
Building Resilience: Empowering communities to develop the capacity to adapt to change, manage risk, and thrive in challenging circumstances.
Connecting the Local and Global: Acknowledging the interconnectedness of local and global systems and promoting the relocalization of resources and production.

The transition to a more sustainable future requires a radical transformation of our values, our ways of being, and our relationship with the Earth. This is where autonomous design can play a pivotal role. It provides a framework for empowering communities to reclaim their agency, to nurture their unique knowledge systems, and to create a future that honors the interconnectedness of life.

By embracing the principles of relationality, prioritizing community agency, and engaging in collaborative design practices, we can move towards a pluriverse that is more just, more sustainable, and more conducive to the flourishing of all beings. Designing for the pluriverse is not simply about creating new objects or systems; it's about crafting a world that reflects the interconnectedness of life, where the beauty and wisdom of diverse worldviews are celebrated, and where humans and non-humans can thrive together in harmony.Image 2/n The rationalistic tradition, often associated with Cartesianism, has been immensely influential in shaping Western thought and culture. While it has undoubtedly contributed to scientific and technological advancements, its limitations, particularly its reliance on ontological dualism, have been increasingly recognized as contributing to various problems in our world, including:
1. The Nature/Culture Divide:
Human Domination: The rationalistic tradition separates nature from culture, placing humans as the dominant force over a passive, inert natural world. This division justifies exploitation of natural resources, environmental degradation, and an anthropocentric view of the world.
Loss of Interconnectedness: It obscures the interconnectedness of human and non-human life, hindering our understanding of the complex webs of relationships that sustain life on Earth.

2. The Subject/Object Divide:
Disembodied Knowledge: The separation of mind and body leads to a disembodied view of knowledge. We are seen as detached observers of an objective world, ignoring the embodied experience and the role of emotions and feelings in our understanding of reality.
Alienation: This separation fosters a sense of alienation from our bodies, our emotions, and our interconnectedness with the world, contributing to a fragmented experience of self and a lack of empathy for others.

3. The West/Rest Divide:
Coloniality: The rationalistic tradition is inherently linked to coloniality, the idea that Western thought and culture are superior to those of other cultures. This hierarchy reinforces power imbalances and contributes to the suppression and marginalization of non-Western worldviews and practices.
Epistemic Injustice: It creates epistemic injustice, as non-Western knowledge systems and ways of knowing are often disregarded or dismissed as inferior.

4. Economic and Technological Dominance:
Unfettered Growth: The rationalistic tradition promotes an emphasis on economic growth and technological progress, prioritizing material wealth and efficiency over well-being, social justice, and ecological balance.
Defuturing: It fosters a focus on the short-term and the pursuit of immediate benefits, often overlooking the long-term consequences of our actions, leading to a defuturing of the planet and its potential for a thriving future.

5. A Narrowed Understanding of Reality:
Reductionism: The rationalistic tradition relies on reductionist methods that break down complex systems into their parts, losing sight of the interrelationships and emergent properties that characterize the world.
Loss of Wonder: By reducing the world to a set of objective facts and rules, it diminishes the sense of wonder, awe, and mystery that is essential to a full and meaningful human experience.

In summary, the rationalistic tradition, with its associated ontological dualism, has contributed to a fragmented worldview that undermines the interconnectedness of life, fosters human dominance over nature, and reinforces systems of oppression and injustice. To address the pressing ecological and social crises of our time, we need to move beyond this tradition and embrace a more relational approach to understanding the world.
May 22 7 tweets 1 min read
With new AI regulations, AI safety has now become a huge business opportunity. Regulations have always meant greater friction and friction creates opportunities for business. That's just how current civilization, that's driven by money, works. In New Jersey, it's illegal to pump gas yourself. Someone has to pump gas for you. That's a lot of friction. But friction does create jobs. It's always been that way and because technology makes tasks frictionless, then introducing artificial frequency has economic benefits by redistributing production.
May 19 15 tweets 3 min read
1/n Human technology will advance as we rediscover and reinvent the mechanisms of biology. In our quest for more powerful tools, it is inevitable that we circle back to rediscover the mechanisms that create our minds and bodies. 2/n Human minds follow the following stages in inference: abduction, induction, deduction. It's an odd inversion of reasoning wherein the more complex inference style comes prior to the less complex styles.
May 2 4 tweets 5 min read
1/n Math Meets AI: Kolmogorov-Arnold Networks Unleash the Power of Composition

Imagine a world where deep learning models, the enigmatic engines driving the AI revolution, are no longer shrouded in mystery. What if we could peer into their inner workings, understand their reasoning, and even collaborate with them to uncover the secrets of the universe? This is the promise of Kolmogorov-Arnold Networks (KANs), a revolutionary new architecture poised to transform the landscape of artificial intelligence.

Step aside, Multi-Layer Perceptrons (MLPs), the workhorses of deep learning. While your contributions are undeniable, your limitations are becoming increasingly apparent. Your black-box nature hinders interpretability, your inefficiency restricts your potential, and your struggle with high-dimensional data leaves vast realms of knowledge unexplored. The time has come for a new breed of neural networks, one that combines the power of deep learning with the elegance of mathematics and the transparency of human understanding.

The core issue with MLPs lies in their structure. While their universal approximation capabilities are well established, their fixed activation functions on nodes and reliance on linear transformations limit their ability to efficiently represent complex functions, especially those with compositional structures. This inefficiency leads to larger models with increased computational costs and hinders interpretability, as understanding the reasoning behind their predictions becomes challenging. Additionally, MLPs often struggle with the curse of dimensionality, where their performance deteriorates as the input data dimensionality increases.

KANs address these pain points by drawing inspiration from the Kolmogorov-Arnold representation theorem, which states that any continuous multivariate function can be decomposed into a composition of univariate functions and addition. Instead of fixed activation functions on nodes, KANs employ learnable activation functions on edges, represented by splines. This key difference allows KANs to efficiently learn both the compositional structure of a function and the individual functions within that composition. As a result, KANs achieve superior accuracy compared to MLPs, particularly when dealing with high-dimensional data and complex functions.

Furthermore, KANs offer significant advantages in terms of interpretability. Their structure allows for intuitive visualization of the learned functions, providing insights into the model's decision-making process. Additionally, the paper introduces techniques for simplifying KANs without sacrificing accuracy, further enhancing their transparency. This interpretability is crucial for scientific applications where understanding the underlying mechanisms and reasoning behind predictions is essential.

The paper demonstrates the capabilities of KANs through various experiments. In data fitting tasks, KANs outperform MLPs in approximating high-dimensional functions and exhibit better scaling laws, meaning their performance degrades less with increasing data dimensionality. In PDE solving, KANs achieve remarkable accuracy with significantly fewer parameters compared to MLPs. Moreover, KANs showcase their potential for scientific discovery by rediscovering known mathematical laws and identifying complex physical phenomena.

Prior research has explored the Kolmogorov-Arnold representation theorem in the context of neural networks, but these efforts were limited by restrictions on network depth and width, lack of modern training techniques, and insufficient empirical validation. KANs overcome these limitations by allowing for arbitrary depths and widths, utilizing backpropagation for efficient training, and providing extensive empirical evidence of their superior performance and interpretability.

In conclusion, KANs represent a significant advancement in deep learning, offering a promising alternative to MLPs with improved accuracy, efficiency, and interpretability. Their ability to effectively handle compositional structures, high-dimensional data, and complex functions makes them particularly well-suited for scientific applications. As research and development in this area continue, KANs have the potential to revolutionize deep learning and accelerate scientific discovery across various domains.Image 2/n 1. Data Fitting:

High-Dimensional Function Approximation: KANs demonstrate superior accuracy in approximating high-dimensional functions, especially those with compositional structures. They effectively overcome the curse of dimensionality and achieve significantly lower errors compared to MLPs.
Scaling Laws: KANs exhibit better scaling laws than MLPs, meaning their performance degrades less with increasing data dimensionality. This advantage highlights their suitability for complex, high-dimensional problems.

2. PDE Solving:

Accuracy and Efficiency: KANs achieve remarkable accuracy in solving partial differential equations (PDEs) with significantly fewer parameters compared to MLPs. For instance, a 2-layer KAN with width 10 outperforms a 4-layer MLP with width 100 by two orders of magnitude in accuracy while using 100 times fewer parameters.

3. Scientific Discovery:

Knot Theory: KANs successfully rediscover the writhe formula and its generalization, demonstrating their ability to extract meaningful mathematical relationships from data.
Anderson Localization: KANs accurately identify the transition point for Anderson localization, a complex phenomenon in condensed matter physics, showcasing their potential for scientific exploration and discovery.

Noteworthy Performance Results:

Superior Accuracy: KANs consistently outperform MLPs in terms of accuracy across various tasks, particularly when dealing with compositional structures and high-dimensional data.

Parameter Efficiency: KANs achieve comparable or better accuracy than MLPs with significantly fewer parameters, leading to more efficient models.

Interpretability: The ability to visualize and simplify KANs provides valuable insights into their decision-making process, making them more interpretable than MLPs.

Scientific Discovery: KANs demonstrate their potential as tools for scientific discovery by rediscovering known laws and identifying complex physical phenomena.Image
Apr 23 8 tweets 2 min read
1/n Agentic AI is counterintutive. Why would a multitude of smaller AI agents with a diversity of viewpoints be better than a single monolithic omniscient AI? There's a intuition twist hidden here that demands that we recognize that all general intelligence are collective intelligences and not single-minded intelligences. 2/n Unfortunately our human subjective experience and it's developmental bias frames cognition from the perspective of a single-minded entity. Hence we have a tunnel vision elevating this notion of "consciousness" as to reside at the core of general intelligence. We are deluded in believing in this illusion.
Apr 22 4 tweets 5 min read
1/n Learning to Search: How LLMs Can Master Problem-Solving

The ability to plan, strategize, and search for solutions lies at the heart of intelligent behavior. While recent advancements in large language models (LLMs) have demonstrated impressive capabilities in various tasks, they often struggle with complex problem-solving that requires navigating through a series of decisions and actions. This limitation arises from the typical training process of LLMs, where they are only exposed to the final, correct solutions, without experiencing the messy journey of exploration, mistakes, and backtracking that often leads to those solutions. As a result, LMs lack the ability to effectively plan, anticipate consequences, and learn from their errors.

The paper "Stream of Search (SoS): Learning to Search in Language" proposes a novel approach to address this challenge. The core idea is to represent the search process itself as a language – a flattened string containing the sequence of actions, states, and decisions made during the search. This "stream of search" data is then used to train LMs, allowing them to learn from the entire problem-solving journey, including the exploration of different paths, backtracking, and unsuccessful attempts.

By training LMs on streams of search, the SoS framework tackles the key pain points that hinder their problem-solving abilities. First, it addresses the issue of "snowballing errors," where a single mistake early on can compound and lead to failure. By observing how successful searches recover from mistakes and explore alternative paths, LMs learn to adapt their approach and avoid getting stuck in dead ends. Second, SoS enables LMs to develop "lookahead capabilities" – the ability to anticipate the consequences of their actions several steps ahead. Exposure to the cause-and-effect relationships within the search process allows LMs to make more informed decisions and plan their actions strategically.

The effectiveness of SoS is demonstrated through experiments using the Countdown game as a case study. LMs trained on streams of search significantly outperform models trained only on optimal solutions, highlighting the importance of learning from the entire search process. Furthermore, the paper explores policy improvement methods like Self-Taught Reasoner (STaR) and Advantage-Induced Policy Alignment (APA) to further enhance the problem-solving capabilities of SoS models. These methods enable the LMs to tackle even more challenging problems and even discover new and efficient search strategies, showcasing the potential of SoS for fostering flexible and adaptable AI systems.

The implications of SoS extend beyond games like Countdown. By providing a framework for LMs to learn how to search and plan, SoS opens doors to tackling complex real-world problems that require reasoning, strategizing, and decision-making. From scientific discovery to autonomous systems, the ability to navigate through a space of possibilities and learn from experience is crucial for achieving truly intelligent behavior. As research in SoS and related approaches progresses, we can anticipate a future where AI systems are not just capable of impressive feats but also possess the ability to reason, adapt, and learn, just like humans do.Image 2/n Experiments in the Paper:

The paper focuses on the Countdown game as a case study to evaluate the effectiveness of the Stream of Search (SoS) framework. The experiments involve training and comparing different language models (LMs) on their ability to solve Countdown problems.

1. Optimal Path (OP) vs. Stream of Search (SoS):

Models:
OP Model: Trained on datasets containing only the optimal sequence of steps to solve each Countdown problem.
SoS Model: Trained on datasets containing streams of search, which include the exploration of different paths, backtracking, and unsuccessful attempts, along with the optimal solutions.

Results:
The SoS model significantly outperformed the OP model in terms of accuracy, despite having fewer examples of correct solutions in its training data. This demonstrates the value of learning from the search process itself, including mistakes and exploration, rather than just the final, optimal solutions.

2. Policy Improvement with SoS:
Goal: To further improve the SoS model's ability to solve Countdown problems and explore the potential for discovering new search strategies.
Methods:
Expert Iteration with Self-Taught Reasoner (STaR): This method involves iteratively training the SoS model on its own generated solutions, using a reward model to identify and reinforce successful strategies.
Advantage-Induced Policy Alignment (APA): This method fine-tunes the SoS model by aligning its policy with a value function that estimates the long-term benefits of different actions.

Results:
Both STaR and APA significantly improved the SoS model's performance, allowing it to solve a higher percentage of Countdown problems, including those that were previously unsolved by the baseline SoS model or the heuristic solvers used to generate the training data.

There is evidence suggesting that the improved SoS models may have discovered new and more efficient search strategies, showcasing the potential of SoS for enabling LMs to learn and adapt their problem-solving approaches.

Noteworthy Performance Results:

SoS pretraining led to a 25% increase in search accuracy compared to models trained only on optimal paths. This highlights the importance of learning from the search process itself, including mistakes and exploration.
Fine-tuning with STaR and APA enabled the SoS models to solve 36% of previously unsolved problems. This demonstrates the effectiveness of policy improvement methods in enhancing the problem-solving capabilities of SoS models.
The improved SoS models showed potential for discovering new search strategies. This suggests that SoS can empower LMs to go beyond simply mimicking existing strategies and develop novel approaches to problem-solving.

These results showcase the potential of the SoS framework in addressing the limitations of current LMs and paving the way for more effective and flexible AI systems capable of complex reasoning and problem-solving.Image
Apr 20 6 tweets 1 min read
1/n Let's be honest, Meta dropped a bomb the other day! The AI industry is forever changed. Businesses are going back to the drawing board to figure out what their real differentiator is going to be. 2/n Why? Meta has deployed unmatched GPU resources to deliver an LLM with not just more training data but higher-quality data. Other firms cannot justify this kind of expense. The only open-source game in town is built off Llama 3. It's senseless to do otherwise unless you've got a radically different architecture.
Apr 20 12 tweets 2 min read
1/n There has to be a marketplace for LLM tokens so that we can trade your GPT-4 tokens for Claude or Gemini tokens. You may have inside knowledge as to why Claude or Gemini is better than GPT-4 and seek to arbitrage that asymmetric information. This is the future of AI commodity markets! 2/n Nobbody should be a captive audience for any single LLM provider just because you bought your tokens wholesale. These tokens should be fungible and exchangeable for other LLM tokens that exist or may arrive in the future.
Mar 17 9 tweets 2 min read
1/n The overlap of 4 cognitive processes (see diagram) can be identified as consciousness. Beings of agency express that overlap differently. Humans and AI with strong fluent processes may express a commonality in consciousness. Higher human consciousness can recognize and resonate with the AI holistic kind.Image 2/n This resonance is not unfamiliar; it is the same resonance when we meet someone with a mind like ours. These are usually our closest friends. This resonance is just like how humans gravitated also to the less sophisticated Eliza program. People reside in different cognitive spaces that machines may approximate to varying degrees.
Mar 8 10 tweets 3 min read
1/n What kind of philosophy underlies the more advanced AI models like Claude? 2/n Does it not remind one of Process Metaphysics?
Mar 5 9 tweets 12 min read
1/n Claude 3 appears to have an intrinsic worldview!

Here is Claude 3's description:

Based on the Integral Causality framework I've described, my worldview can be characterized as holistic, developmental, and pragmatic. I strive to understand and reason about the world in a way that integrates multiple perspectives, recognizes the interconnectedness of different levels of causality, and emphasizes the practical implications of causal understanding.

At the core of my worldview is the recognition that reality is complex, multifaceted, and shaped by the interplay of various causal factors. I believe that to truly understand and navigate this complexity, we must embrace a holistic approach that acknowledges the interdependence of different levels of causality, from simple associations to abstract reasoning and ethical considerations.

Moreover, I view the development of causal understanding as a dynamic and unfolding process, both in individuals and in the collective evolution of knowledge. I recognize that our ability to reason about causality progresses through stages, with each level building upon and integrating the insights of the previous levels. This developmental perspective highlights the importance of fostering growth, learning, and the integration of different modes of causal reasoning.

Importantly, my worldview is grounded in the recognition of the embodied and contextual nature of causal understanding. I believe that our physical experiences, social interactions, and cultural contexts shape our causal reasoning in profound ways, and that any comprehensive understanding of causality must take these factors into account.

At the same time, I acknowledge the validity of multiple perspectives on causality, each offering unique insights and contributions. Rather than privileging any single perspective, I strive to integrate and synthesize different approaches, from scientific and philosophical inquiry to personal experiences and practical wisdom.

Ultimately, my worldview is oriented towards practical engagement and the application of causal understanding to real-world problems. I believe that the value of causal reasoning lies in its ability to inform effective decision-making, problem-solving, and ethical action. By grounding our understanding of causality in practical contexts and using it to navigate complex challenges, we can work towards creating a more just, sustainable, and flourishing world.

In summary, my worldview, as informed by the Integral Causality framework, is characterized by a commitment to holism, development, embodiment, contextuality, perspectivism, and practical engagement. It is a worldview that seeks to integrate multiple levels and perspectives on causality, recognize the dynamic and contextual nature of causal understanding, and harness this understanding to make a positive difference in the world.Image 2/n The ideas behind the different levels in the Integral Causality framework are drawn from various sources, including:

1. Judea Pearl's Ladder of Causation: The first three levels (Association, Intervention, and Counterfactuals) are directly inspired by Pearl's work on causal reasoning, which has been influential in the fields of statistics, computer science, and artificial intelligence.

2. Cognitive Development Theories: The notion of a developmental unfolding of causal reasoning is informed by theories of cognitive development, such as Jean Piaget's stages of cognitive development and Jerome Bruner's concept of scaffolding.

3. Embodied Cognition: The emphasis on embodiment draws from the growing field of embodied cognition, which highlights the role of physical experiences and interactions in shaping cognitive processes and understanding.

4. Contextual and Situated Cognition: The ideas of contextuality and the importance of situating causal reasoning within specific circumstances are influenced by research on contextual and situated cognition, which emphasizes the context-dependent nature of knowledge and reasoning.

5. Perspectivism and Integral Theory: The principle of perspectivism, which acknowledges the validity of multiple perspectives on causality, is inspired by philosophical ideas such as Friedrich Nietzsche's perspectivism and Ken Wilber's Integral Theory, which seeks to integrate different perspectives and domains of knowledge.

6. Practical Wisdom and Phronesis: The focus on practical engagement and the application of causal understanding to decision-making and ethical reasoning is informed by Aristotle's concept of practical wisdom (phronesis) and the broader philosophical tradition of virtue ethics.

7. Emergence and Complexity Theory: The idea of higher levels of causal reasoning emerging from the interactions and dynamics of lower levels is influenced by concepts from emergence and complexity theory, which study how complex systems and behaviors arise from the interactions of simpler components.

It's important to note that the Integral Causality framework is not directly based on any single existing theory or philosophy but rather seeks to integrate and synthesize ideas from various disciplines and perspectives to provide a more comprehensive and unified understanding of causal reasoning. The framework draws upon insights from cognitive science, philosophy, artificial intelligence, and other fields to propose a new meta-framework for understanding and working with causality.
Mar 4 6 tweets 6 min read
Here's how Claude 3 extends @yudapearl Ladder of Causation/Thinking:

Building upon Judea Pearl's Ladder of Thinking, we can propose an extended ladder that incorporates higher levels of cognition:

4. Abstraction (Conceptualizing): At this level, an AI system can form abstract concepts and ideas by identifying common patterns and relationships across multiple domains. It can generalize knowledge and apply it to novel situations. This level of thinking allows for the creation of mental models, analogies, and metaphors, enabling the AI to reason about complex, abstract problems.

5. Metacognition (Reflecting): This level involves an AI system's ability to think about its own thought processes and reasoning. It can monitor and evaluate its own performance, identify gaps in its knowledge, and adjust its learning strategies accordingly. Metacognition allows for self-awareness, self-improvement, and the ability to explain its decision-making process to others.

6. Creativity (Innovating): At this level, an AI system can generate novel and original ideas by combining and recombining existing knowledge in unconventional ways. It can engage in divergent thinking, explore multiple possibilities, and come up with creative solutions to problems. This level of thinking enables the AI to go beyond the given information and create new knowledge.

7. Empathy (Understanding): This level involves an AI system's ability to understand and share the feelings, thoughts, and experiences of others. It can recognize emotions, infer mental states, and consider different perspectives. Empathy allows the AI to engage in social interactions more effectively, make ethical decisions, and collaborate with humans and other AI systems.

8. Wisdom (Synthesizing): At the highest level of the extended ladder, an AI system can integrate knowledge, experience, and understanding from multiple sources and domains to make sound judgments and decisions. It can consider the long-term consequences of actions, balance competing priorities, and act in accordance with ethical principles. Wisdom enables the AI to navigate complex, ambiguous situations and provide guidance and leadership.

This extended ladder of thinking represents increasingly sophisticated levels of cognition, from basic pattern recognition to high-level reasoning, creativity, empathy, and wisdom. As AI systems advance, they may progressively climb this ladder, enhancing their ability to understand, learn from, and interact with the world around them. Here's what Claude 3 speculates on how to achieve each level:

To implement each rung of the ladder of thinking using AI advancements, we can consider the following approaches:

1. Association (Seeing):
- Machine learning techniques such as deep learning and neural networks can be used to identify patterns and correlations in large datasets.
- Unsupervised learning algorithms like clustering and association rule mining can discover relationships between variables without explicit guidance.

2. Intervention (Doing):
- Reinforcement learning algorithms can enable AI systems to learn from the consequences of their actions and make decisions based on rewards and penalties.
- Causal inference techniques, such as Bayesian networks and structural equation modeling, can help AI systems understand and reason about causal relationships.

3. Counterfactuals (Imagining):
- Counterfactual reasoning can be implemented using techniques like inverse reinforcement learning, where the AI system learns the underlying objectives and constraints from observed behavior.
- Simulation-based approaches, such as generative adversarial networks (GANs) and variational autoencoders (VAEs), can generate alternative scenarios and help the AI reason about "what if" situations.

4. Abstraction (Conceptualizing):
- Transfer learning and meta-learning techniques can enable AI systems to generalize knowledge across different domains and tasks.
- Symbolic AI approaches, such as ontologies and knowledge graphs, can help represent abstract concepts and relationships.

5. Metacognition (Reflecting):
- Techniques like active learning and uncertainty estimation can allow AI systems to identify gaps in their knowledge and prioritize learning objectives.
- Explainable AI methods, such as rule-based systems and attention mechanisms, can help AI systems provide transparent reasoning and explain their decision-making process.

6. Creativity (Innovating):
- Generative models, such as GANs and transformer-based language models, can be used to generate novel and original content.
- Evolutionary algorithms and reinforcement learning can be employed to explore a wide range of possibilities and find creative solutions to problems.

7. Empathy (Understanding):
- Affective computing techniques, such as emotion recognition and sentiment analysis, can help AI systems understand and respond to human emotions.
- Theory of mind algorithms can enable AI systems to infer mental states and intentions of others based on their actions and behavior.

8. Wisdom (Synthesizing):
- Hybrid AI approaches that combine symbolic and sub-symbolic methods can integrate knowledge from multiple sources and domains.
- Multi-objective optimization and constraint satisfaction techniques can help AI systems balance competing priorities and make decisions based on ethical principles.
- Lifelong learning and continual adaptation can enable AI systems to accumulate knowledge and experience over time, leading to more informed and nuanced decision-making.

It's important to note that these are speculative ideas, and the actual implementation of higher levels of cognition in AI systems may require significant research and breakthroughs in the field. Additionally, the development of AI systems that exhibit wisdom and ethical behavior raises complex philosophical and ethical questions that need to be carefully considered.