Here are the top AI Papers of the Week (Jan 13-19):
- VideoRAG
- MiniMax-01
- Enhancing RAG
- Self-Adaptive LLMs
- Foundations of LLMs
- Learning to Memorize at Test Time
Read on for more:
1). Self-Adaptive LLMs - introduces Transformer^2, a novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting singular components of their weight matrices...
2). MiniMax-01 - introduces a new series of models that integrate Mixture-of-Experts; introduces a model with 32 experts and 456B parameters, and 45.9B are activated for each token...
4). Learning to Memorize at Test Time - introduces a neural long-term memory module to memorize historical context and help attention to attend to the current context while utilizing long past information.
7). Enhancing RAG - systematically explores the factors and methods that improve RAG systems such as retrieval strategies, query expansion, contrastive in-context learning, prompt design, and chunking.
8). AutoCBT - proposes a multi-agent framework, AutoCBT, for Cognitive Behavioral Therapy; the work proposes a general multi-agent framework that generates high-quality responses for single-turn psychological consultation scenarios...
9). Imagine while Reasoning in Space - introduces MVoT (Multimodal Visualization-of-Thought), a new reasoning framework that enables AI models to "think" in both text and image.
10). ChemAgent - presents a new framework designed to improve the performance of LLMs on chemical reasoning through a dynamic, self-updating library...
1). DeepSeek-V3 - a 671B-parameter MoE language model that activates 37B parameters per token, utilizing MLA and DeepSeekMoE architectures for efficient operation
2). Large Concept Models - presents an approach that operates on sentence-level semantic representations called concepts, moving beyond token-level processing typical in current LLMs.
2). Alignment Faking in LLMs - demonstrates that the Claude model can engage in "alignment faking"; it can strategically comply with harmful requests to avoid retraining while preserving its original safety preferences; this raises concerns about the reliability of AI safety training methods.
The Top ML Papers of the Week (July 29 - August 4):
- MindSearch
- Refusal in LLMs
- Constrained-CoT
- Meta-Rewarding LLMs
- Evaluating Persona Agents
- Improved RAG with Self-Reasoning
...
1/ Meta-Rewarding LLMs - proposes a self-improving alignment technique (no human supervision) where the LLM judges its own judgements and uses the feedback to improve its judgment skills...
2/ MindSearch - presents an LLM-based multi-agent framework to perform complex web-information seeking and integration tasks; a web planner effectively decomposes complex queries followed by a web searcher that performs hierarchical information retrieval on the Internet to improve the relevancy of the retrieved information.
- Guide for Evaluating LLMs
- Efficient Multimodal LLMs
- Scientific Applications of LLMs
- Enhancing Answer Selection in LLMs
- Claude 3 Sonnet Interpretable Features
- Agent Planning with World Knowledge Model
...
1/ Extracting Interpretable Features from Claude 3 - presents an effective method to extract millions of abstract features from an LLM that represent specific concepts; these concepts could represent people, places, programming abstractions, and more...
2/ Agent Planning with World Knowledge Model - a parametric world knowledge model to facilitate agent planning; the agent model can self-synthesize knowledge from expert and sampled trajectories; this is used to train the world knowledge model.
1). DBRX - a new 132B parameter open LLM that outperforms all the established open-source models on common benchmarks like MMLU and GSM8K; DBRX was pretrained on 12T tokens (text and code) and uses a mixture-of-experts (MoE) architecture.
2). Grok-1.5 - xAI’s latest long-context LLM for advanced understanding and reasoning and problem-solving capabilities; Grok-1.5 achieved a 50.6% score on the MATH benchmark and a 90% score on the GSM8K benchmark.
1/ Stable Diffusion 3 - a suite of image generation models ranging from 800M to 8B parameters; combines diffusion transformer architecture and flow matching for improved performance in multi-subject prompts, image quality, and spelling abilities.
2/ Gemma - a series of open models inspired by the same research and tech used for Gemini; includes 2B (trained on 2T tokens) and 7B (trained on 6T tokens) models including base and instruction-tuned versions; trained on a context length of 8192 tokens.