Carlos E. Perez Profile picture
Feb 5 3 tweets 3 min read Read on X
1/n An ontology of Large Language Model (LLM) powered Multi-Agents

- Single LLM-based agents have shown promising capabilities such as planning, tool use, memory, and decision making. This has motivated research into multi-agent systems.
- LLM-multi agent (LLM-MA) systems aim to leverage multiple specialized agents collaborating together, providing advanced problem solving compared to single agents.

Existing Issues
- Most existing work focuses on single LLM-based agents. There is a lack of systematic analysis of emergent capabilities and issues in LLM-MA systems.
- Early LLM-MA systems have been developed independently. There is an absence of a unified blueprint and taxonomy to connect different aspects like agent profiling, communication protocols etc.
- There is a gap in benchmarks and evaluation methods tailored for assessing collaborative intelligence of LLM-MA systems. Metrics focused on individual agents may overlook emergent group behaviors.
- Open challenges remain in scaling LLM-MA systems, managing collective capabilities, mitigating issues like hallucination, and expanding applications to complex real-world problems.

In summary, while single LLM-agents have made strides, there are open questions regarding formulating, analyzing, evaluating and advancing collaborative multi-agent systems for sophisticated tasks. Establishing a unified blueprint can accelerate progress.Image
2/n The LLM-based multi-agent (LLM-MA) ontology consists of four key components:

1. Agents-Environment Interface:
This refers to how agents perceive and interact with the environment (sandbox, physical, or none). Environments could be software applications, embodied robot systems, gaming simulations etc.

2. Agent Profiling:
It deals with characterizing distinct agents based on aspects like roles, capabilities, constraints etc. Common methods are pre-defined profiles, model-generated profiles, or data-derived profiles.

3. Agent Communication:
This encompasses communication paradigms (cooperative, debate, competitive), structures organizing agent interactions (layered, decentralized, centralized), and actual content exchanged (typically textual).

4. Agent Capability Acquisition:
It focuses on how agents obtain feedback to enhance their skills over time via memory, self-evolution by modifying goals/strategies, or dynamic agent generation.

In a nutshell, this ontology systematically connects the agents themselves, the environments they operate in, how they interact to solve problems collectively, and how they acquire knowledge. According to the paper, positioning LLM-MA systems in this framework can enable more structured analysis. The ontology provides a blueprint for continued research and applications.Image
3/n Here's the survey paper:
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
DOI: 10.13140/RG.2.2.36311.85928researchgate.net/publication/37…Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Carlos E. Perez

Carlos E. Perez Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @IntuitMachine

Feb 3
1/n Discovered this book (h/t @Extended_Brain). Let's look into some nuggets of wisdom! Image
2/n In his chapter on "Personal Knowledge," Michael Polanyi argues that all knowledge involves personal participation and commitment on the part of the knower. He introduces the concept of "tacit knowing" to describe the process by which personal knowledge is accumulated. Tacit knowing stands in contrast to the ideals of detached objectivity and value neutrality often associated with scientific knowledge.

At the heart of tacit knowing is subsidiary awareness—attending to one thing by focusing on another related or connected thing. For example, we may identify a person by his clothes, or we attend to the weight of a hammer in our palm as we focus on driving the nail. What we are focally aware of and what we are subsidiarily aware of mutually depend on each other in tacit knowing. Our subsidiary awareness of clues, instruments, and context allows us to comprehend the focal target, while the target itself determines what counts as clues or instruments relevant to discerning its nature.

Tacit knowing pervades multiple forms of skillful achievement, including practical skills like cycling and swimming but also more abstract capabilities like reading comprehension or facial recognition. It has a from-to structure—we go from perception of subsidiaries to comprehension of a coherent whole. This always involves our active shaping and organizing of subsidiaries to integrate them for meaning.

Polanyi identifies three key aspects to tacit knowing: functional, phenomenal, and semantic. The functional aspect is the from-to relation itself and how we dwell in the particulars to attend to the whole. The phenomenal aspect is that through integrative acts like binocular vision or reading, we achieve a new phenomenal experience beyond what direct inspection of the parts would indicate. Finally, the semantic aspect is the meaning-giving relationship where subsidiaries acquire their sense by bearing on the focus.

An important implication is that all knowledge depends on personal judgment to turn clues into comprehension. There are no explicit rules determining what coheres or what is meaningful. As Polanyi puts it, "into every act of knowing there enters a tacit and passionate contribution of the person knowing what is being known." While aiming at an external reality, our understanding relies fundamentally on internal processes of integration that connect knower and known. Tacit knowing is an inescapable and universal feature of human knowledge.
3/n The Reconstruction chapter explores how Polanyi's theory of personal knowledge and tacit integration can help reconstruct our understanding of science and knowledge after the damage done by positivism and radical skepticism. He wants to show how personal participation and intuition are essential to science.

A major target is the mistaken ideal of detached objectivity. Polanyi argues that all knowledge depends on commitments, beliefs, and personal judgments that shape how we integrate clues into coherence and meaning. Even facts of science rely on scientists skillfully reading instruments in ways that involve unspecifiable personal elements. There is no totally explicit, impersonal kind of knowing.

Imagination and intuition play crucial roles in this view of science:

1. Intuition guides recognition of a problem and assessment of whether it is promising to pursue based on subtle clues. This "strategic intuition" shapes the basic vision scientists have for where truth may lie hidden.

2. Imagination then drives persistent efforts to search for clues and piece together patterns toward a possible solution, in a quest guided broadly by intuition about what seems plausible and meaningful.

3. Finally, intuition spontaneously offers an integrative vision that may solve the problem, often after unconscious incubation of ideas mobilized by the imagination. This "concluding intuition" provides the fulfillment of meaning.

So intuition sets the direction, imagination does the hard work, and intuition synthesize the fruits of inquiry. This cycle of imagination and intuition leads potentially to moments of discovery and insight that scientifically reveal reality.

For Polanyi, imagination creatively anticipates realities that may manifest themselves in the future. Intuition senses possibilities for systematic meaning by dwelling in the implications of existing knowledge. Together, they push science to expand into the unknown guided by a slope of deepening meaning, rather than just accumulating facts.

This theory of discovery opposes mechanical views of scientific reasoning. It requires accepting non-explicit personal judgments of coherence at the heart of science. Appreciating the role of imagination and intuition allows restoring a richer understanding of scientific inquiry as an open-ended human process aiming to articulate hidden realities.

The key is recognizing that all knowledge involves skillful integration of particulars guided by ideals of coherence and purpose. Science is inescapably shaped by the personal participation of dedicated thinkers seeking meaningful truths about the world.
Read 6 tweets
Feb 2
1/n A Taxonomy for Multi-Modal Large Language Models

Architecture
The architecture consists of 5 key components:

1. Modality Encoder: Encodes inputs from modalities like image, video, audio into feature representations. Common options include NFNet-F6, ViT, CLIP ViT, C-Former, etc.

2. Input Projector: Aligns non-text modality features to the text feature space of the LLM. This uses cross-attention, Q-Former, P-Former, or simple MLPs/linear layers.

3. LLM Backbone: Core large language model that processes aligned multi-modal representations and generates textual outputs + signal tokens for conditional generation. Popular choices are Flan-T5, Vicuna, OPT, LLaMA, etc.

4. Output Projector: Maps signal token representations into features that can be understood by the Modality Generator. Uses a Tiny Transformer or MLP.

5. Modality Generator: Generates outputs in modalities like image, video, audio conditioned on the mapped features. Typically uses off-the-shelf latent diffusion models like Stable Diffusion, AudioLDM, etc.

Training Pipeline:
The training pipeline has 2 key stages -

1. Multi-Modal Pre-Training: trains the Input and Output Projectors using image-text, video-text, audio-text datasets to align modalities. May fine-tune small trainable parameters in LLM backbone using methods like prefix tuning.

2. Multi-Modal Instruction Tuning: further trains the model on instruction-formatted datasets using reinforcement learning from human feedback. This enhances model's alignment with human preferences and interaction capabilities.Image
2/n The input process flow

1. Modality Encoder:
- Encodes inputs from modalities like image, video, audio into feature representations.
Example:
- Input: An image of a cat
- CLIP ViT encoder encodes it into a 768-d feature vector representing the visual concepts in the image

2. Input Projector
- Projects non-text modality features into the textual feature space of LLM
Example:
- The 768-d cat image feature from CLIP ViT
- A linear layer projects it into a 1024-d vector aligned with text vector space
- Other options like cross-attention, Q-Former can also achieve this alignment

3. LLM Backbone
- Core large language model that processes the aligned multi-modal representations
Example:
- The 1024-d projected cat image feature vector
- Textual caption describing the image: "A cute cat playing with a ball of yarn"
- These text and image features are fed into the LLM backbone like OPT or LLaMA
- The LLM encodes them into a joint representation in its latent space and generates relevant outputs

So in summary, the modality encoders create non-text representations, input projectors transform them into an LLM-compatible space, and LLM backbone fuses information from all aligned modalities to understand concepts across modalities. The flow enables the fusion of multi-modal knowledge into the LLM.Image
2/n the output process flow

Here is an explanation of the flow from LLM backbone to output projector to modality generator with examples:

1. LLM Backbone
- Core large language model that processes aligned multi-modal representations
- Can generate text describing desired outputs in other modalities

Example:
- Encoded features of an image of a dog along with textual caption
- LLM backbone (like PaLM) generates text: "generate a 1920x1080 image morphing the dog into a cat"

2. Output Projector
- Maps the text encoding from LLM backbone into features compatible with target modality

Example:
- The text encoding from LLM backbone representing the "morph dog into cat" instruction
- An MLP output projector transforms it into a latent feature vector

3. Modality Generator
- Generates outputs in target modalities conditioned on the projected features

Example:
- Latent vector representation of "morph dog into cat" instruction
- Stable Diffusion image generator uses that conditioning vector
- Generates a 1920x1080 image morphing the dog image into a cat by neural rendering

So in summary, the LLM backbone generates descriptive texts of desired outputs, output projectors transform those text features into compatible latent spaces, and modality generators use those features to synthesize novel outputs. This enables multi-modal generative capabilities via text-conditioning.Image
Read 4 tweets
Feb 1
1/n Introducing RAPTOR

Existing RAG methods suffer from a major limitation: they can only retrieve short, contiguous passages of text. This restricts their capacity to represent cross-document discourse structure and leverage thematic information scattered across lengthy corpora. As a result, performance suffers on complex questions requiring multi-step inference or synthesis of knowledge from multiple sections.

Fixed language models also face challenges staying up-to-date, as baking vast world knowledge into model parameters makes it arduous to edit or append facts. Yet relying on outdated embedded knowledge severely impairs real-world reliability and accuracy.

This paper introduces RAPTOR, a novel recursive abstraction paradigm that overcomes both issues through hierarchical multi-document representation. RAPTOR segments text, then recursively clusters, summarizes, and embeds passages. This structures corpora into multi-layer trees encoding information at varying levels of abstraction.

Querying this rich tree representation allows integrating details and high-level themes simultaneously. Controlled experiments exhibit consistent improvements over baseline retrievers across several QA datasets. Moreover, by augmenting powerful readers like GPT-4, RAPTOR reaches new state-of-the-art results on multifaceted reasoning tasks requiring nuanced understanding of lengthy narratives.

Modularizing knowledge into RAPTOR’s index also facilitates updating world facts. As corpus contents evolve, the reader persists unaltered, flexibly adapting to current information needs. This crucial agility makes RAPTOR invaluable for dynamic real-world deployments.

In summary, RAPTOR provides a sorely lacking solution for multi-document reasoning and updatable retrieval-based QA. Leveraging recursive summarization and abstraction, it encodes corpora with sufficient semantic depth for complex queries. RAPTOR delivers substantial gains; its strong empirical performance confirms the merits of tree-based hierarchical retrieval augmentation.Image
2/n The RAPTOR process:

1. Text Segmentation
- Split retrieval corpus into short, contiguous chunks of 100 tokens, similar to traditional methods
- Keep sentences intact even if over 100 tokens to preserve coherence

2. Text Embedding
- Embed text chunks using SBERT to get dense vector representations

3. Clustering
- Employ soft clustering using Gaussian Mixture Models and UMAP dimensionality reduction
- Vary UMAP parameters to identify global and local clusters
- Use Bayesian Information Criterion for model selection to determine optimal number of clusters

4. Summarization
- Summarize the chunks in each cluster using a language model
- Results in a condensed summary capturing key information

5. Node Creation
- Clustered chunks + corresponding summary = new tree node

6. Recursive Processing
- Repeat steps 2-5: Re-embed summaries, cluster nodes, generate higher level summaries
- Forming a multi-layer tree from the bottom up
- Until clustering is infeasible (final root node summarizes the entire corpus)

7. Retrieval
- Two methods: tree traversal (top-down layer by layer) or collapsed tree (flattened view)
- For each, compute cosine similarity between query and nodes to find most relevant

So in summary, RAPTOR leverages recursive clustering and summarization of text chunks to create a hierarchical tree structure for more effective contextual retrieval.Image
3/n Summary of key related work

Retrieval Methods
- Use standard chunking to index passages
- RAPTOR creates recursive tree structure with hierarchical summarization

Joint Passage Retrieval
- Tree decoding to handle passage diversity
- RAPTOR clusters semantically related passages

Summarization Models
- Recursive summarization using task decomposition
- RAPTOR allows flexible grouping and keeps intermediate details

Dense Hierarchical Retrieval
- Combines document and passage retrievals
- RAPTOR focuses on passage-level, adds recursive abstraction

Long Context Models
- Expand context lengths models can handle
- RAPTOR provides relevant subsets of text
Read 7 tweets
Jan 30
1/n Exploiting Large Language Models (LLMs), RAG and KGs for Creative Design

A recent paper makes a compelling case for the tremendous yet untapped potential of large language models (LLMs) to transform materials science research. However, the authors thoughtfully acknowledge critical "pain points" in relying solely on the raw capabilities of LLMs in this complex domain. Accuracy, nuance, interpretability, reasoning - on all these fronts, LLMs fall short without a guiding hand.

That's exactly why this paper shines. It outlines strategies to partner with LLMs to elicit their strengths while overcoming weaknesses. Retrieval augmentation (RAG) provides lacks context to ground responses. Knowledge graphs (KGs) organize concepts ontologically to lend structure and meaning. Non-linear prompting channels creativity through critical filters. Diverse model collectives enable cooperative discovery.

What emerges is a vision for a new paradigm - LLMs not as opaque oracles, but as flexible components in an intelligible, distributed materials discovery infrastructure. One where human researchers set the objectives, models rapidly compound knowledge through code and data, and reciprocal feedback loops drive exploration.

This paper thus makes a timely case. That to fully actualize the manifest benefits of AI in advancing materials science, we must raise these powerful models to collaborators in a hybrid intelligence system built on transparency, trust, and shared creativity fueled by human curiosity.Image
2/n Main strategies covered:

1) Retrieval-augmented generation (RAG) methods to inject additional knowledge into the generative process to improve accuracy. RAG is highlighted as a powerful approach, especially when combined with graph-based methods.

2) Ontological knowledge graphs to provide interpretable structure that captures concepts and relationships. This facilitates mechanistic insights and more detailed responses from the LLM.

3) Nonlinear sampling techniques like tree-of-thought prompting to iteratively refine and improve responses, overcoming limitations of single-shot linear sampling.

4) Multi-agent models where specialized LLMs collaborate and interact autonomously to solve complex multimodal problems. Illustrates promise for advanced applications like automated force-field development.
3/n RAG methods

1) MechGPT is tested on a series of domain-specific questions related to materials failure. It shows reasonable performance and provides accurate answers to these complex questions from its trained knowledge.

2) Retrieval augmented generation (RAG) is then used, where relevant context from a corpus is provided along with the question to MechGPT. This does not significantly improve the answers in this case since MechGPT has already been well-trained on the mechanics failure knowledge required for the questions.

3) An edge case is shown where MechGPT fails to provide accurate info on "molybdenene", a recently published material not in its training data. It incorrectly states molybdenene is theoretical.

4) Using RAG with the molybdenene paper as the knowledge source leads to much improved responses. Multiple detailed Q&A pairs demonstrate MechGPT can now accurately describe key features like the square lattice structure and predicted brittle fracture behavior.

In summary, MechGPT shows reasonable domain-specific question answering performance. RAG improves responses for out-of-training-distribution cases but does not further enhance in-distribution performance. This highlights strengths but also limitations of relying solely on an LLM's parameter-based knowledge.
Read 8 tweets
Jan 23
1/n The most important civic duty that a nation can instill in its citizens is the importance of life-long learning. This goes beyond access to education for our children. It involves a culture that leans toward healthy collaboration and drives toward sustained innovation.
2/n It is no surprise that so many citizens feel left out in today's system. People have never learned the skills to learn independently. But AI radically remedies this deficit! GPT-like systems are tireless teachers who can adapt their conversations to a student's cognitive biases and limitations.
3/n All learning agents frame their understanding by projecting their observations into perspectives that their minds have previously adopted. We are agents with learned cognitive biases. Furthermore, these biases are encoded and reinforced in language.
Read 11 tweets
Jan 19
1/n Let's talk about Flow Enginneering that's discussed in the AlphaCodium paper:

The paper introduces the concept of "flow engineering" to characterize their proposed approach of AlphaCodium, and contrasts it with typical "prompt engineering" methods. The use of the term "flow engineering" can be justified in the following ways:

1. Multi-stage iterative process: AlphaCodium involves a structured, test-driven flow with progressive stages - problem analysis, test generation, initial coding, and iterative run-fix cycles. This goes beyond crafting an optimal prompt.

2. Incorporating code execution: The flow deeply integrates execution of the generated code against input-output examples into the modeling process, rather than purely focusing on static prompt tuning. This dynamic run-fix iteration on increasing tests sets it apart.

3. Scaffolding code development: The multi-step methodology provides a scaffolding that mirrors the software development process by incrementally going from specifications to code, resembling test-driven cycles.

4. Code-centric techniques: Several techniques tailor-made for code tasks supplement the basic flow - modular code prompting, test anchors prevent code divergence, output validation using test suites.

5. Knowledge accumulation: Each stage in the AlphaCodium flow builds up artifacts, learnings and validated components which are accumulated to aid downstream steps - a departure from one-off prompt engineering.

In summary, the use of the term "flow engineering" underscores the process-centric, execution-backed, and code-aware nature of the methodology going beyond static prompt design. It better captures the iterative, test-driven, development-mimetic essence.Image
2/n This paper is entirely fascinating in that it introduces an entirely novel way of viewing subsequent reasoning processes that influence both long chains of inference as well as subsequent retraining.

The paper proposes several code-oriented design concepts and best practices:

1. YAML Structured Output:
- Ask the model to generate output in YAML format conforming to a given Pydantic class definition.
- Eliminates need for complex prompt engineering, allows complex structured answers.
- More suitable than JSON for code due to handling of quotes, special chars etc.

2. Semantic Reasoning via Bullet Points:
- When asking the model to reason about a problem, use bullet point format.
- Forces splitting into logical sections, improves understanding.

3. Modular Code Generation:
- Ask the model to divide code into small sub-functions with meaningful names.
- Results in better code quality, easier iterative fixing.

4. Soft Decisions with Double Validation:
- Avoid strict decisions by the model which lead to hallucinations.
- Double validate potentially erroneous outputs.

5. Postponing Decisions and Exploration:
- Gradually move from easier to harder tasks, avoiding irreversible decisions early on.
- Leave room for exploring multiple possible solutions.

6. Test Anchors:
- Fix codes incorrectly when iterating on potentially invalid AI-generated tests.
- Use already passed tests as anchors to detect erroneous fixes.

It incorporates many of the best practices of agile software development in a machine learning optimization process!
3/n I'm pleasantly surprised to discover insightful principles like the soft decisions with double validation"

1. Motivation:
- Language models often struggle when required to make strict, non-trivial decisions regarding complex issues.
- This leads to hallucinations and erroneous answers.

2. Technique:
- Avoid asking direct yes/no questions about complicated problems.
- Instead, adopt a gradual flow from easier to harder tasks.

- For example, when generating additional tests for a problem:
- First generate tests, then validate them.
- Rather than asking "is this test correct"?

- Use double validation:
- Given a generated output, ask the model to regenerate it while correcting errors.
- Encourages critical reasoning rather than yes/no judgement.

3. Example:
- When generating additional input-output tests for a coding problem:
- Firstly generate tests covering aspects missed by public tests.
- Then show the generated tests back to the model.
- Ask it to regenerate the tests while fixing any errors.

4. Benefits:
- Avoids strict decisions prematurely.
- Allows open-ended exploration first, validates later.
- Double validation improves quality by self-correction.

So in summary, the key ideas are to avoid rigid decisions early on, gradually build knowledge, validate potentially erroneous
outputs by regenerating them while allowing corrections - instead of demanding yes/no judgements. This technique of "soft decisions" coupled with "double validation" improves models' reasoning abilities.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(