Takeaways/Observations/Advice from my #NeurIPS2018 experience (thread):
❄️(1): deep learning seems stagnant in terms of impactful and new ideas
❄️(2): on the flip side, deep learning is providing tremendous opportunities for building powerful applications (could be seen from the amount of creativity and value of works presented in workshops such as ML for Health and Creativity)
❄️(3): the rise of deep learning applications is all thanks to the continued integration of software tools (open source) and hardware (GPUs and TPUs)
❄️(4): Conversational AI is important because it encompasses most subfields in NLP... also, embedding social capabilities into these type of AI systems is a challenging task but very important one going forward
❄️(5): it's important to start to think about how to transition from supervised learning to problems involving semi-supervised learning and beyond. Reinforcement learning seems to be the next frontier. BTW, Bayesian deep learning is a thing!?
❄️(6): we should not avoid the questions or the thoughts of inspiring our AI algorithms based on biological systems just because people are saying this is bad... there is still a whole lot to learn from neuroscience
❄️(7): when we use the word "algorithms" to refer to AI systems it seems to be used in negative ways by the media... what if we use the term "models" instead? (rephrased from Hanna Wallach)
❄️(8): we can embrace the gains of deep learning and revise our traditional learning systems based on what we have learned from modern deep learning techniques (this was my favorite piece of advice)
❄️(9): the ease of applying machine learning to different problems has sparked leaderboard chasing... let's all be careful of those short-term rewards
❄️(10): there is a ton of noise in the field of AI... when you read about AI papers, systems and technologies just be aware of that
❄️(11): causal reasoning needs to be paid close attention... especially as we begin to heavily use AI systems to make important decisions in our lives
❄️(12): efforts in diversification seems to have amplified healthy interactions between young and prominent members of the AI community
❄️(13): we can expect to see more multimodal systems and environments being used and leveraged to help with learning in various settings (e.g., conversation, simulations, etc.)
❄️(14): let's get serious about reproducibility... this goes for all sub-disciplines in the field of AI
❄️(15): more efforts need to be invested in finding ways to properly evaluate different types of machine learning systems... this was a resonant theme at the conference...from the NLP people to the statisticians to the reinforcement learning people... it's a serious problem
I will formalize and expound on all of these observations, takeaways, and advice learned from my NeurIPS experience in a future post (will be posted directly at @dair_ai)... at the moment, I am still trying to put together the resources (links, slides, papers, etc.)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Here is what the layered architecture of the agent internet ecosystem looks like as it stands. It shows different layers, like the Agent Internet, the Protocol Layer, and the Application Layer.
Timeline
The report provides an overview of LLMs, the agent frameworks, agent protocols, and popular applications from 2019 till now. It's not complete, but it provides a rough overview of the progress. It's still early days for agents, and stronger LLMs and protocols are key.
Microsoft just released Phi-4-Mini-Reasoning to explore small reasoning language models for math.
Let's find out how this all works:
Phi-4-Mini-Reasoning
The paper introduces Phi-4-Mini-Reasoning, a 3.8B parameter small language model (SLM) that achieves state-of-the-art mathematical reasoning performance, rivaling or outperforming models nearly TWICE its size.
Unlocking Reasoning
They use a systematic, multi-stage training pipeline to unlock strong reasoning capabilities in compact models, addressing the challenges posed by their limited capacity.
Uses large-scale distillation, preference learning, and RL with verifiable rewards.
Then you see papers like this and it gives you a better understanding of the opportunities and challenges ahead.
Lots of great ideas in this paper. I've summarized a few below:
What is it?
UniversalRAG is a framework that overcomes the limitations of existing RAG systems confined to single modalities or corpora. It supports retrieval across modalities (text, image, video) and at multiple granularities (e.g., paragraph vs. document, clip vs. video).
Modality-aware routing
To counter modality bias in unified embedding spaces (where queries often retrieve same-modality results regardless of relevance), UniversalRAG introduces a router that dynamically selects the appropriate modality (e.g., image vs. text) for each query.
Building Production-Ready AI Agents with Scalable Long-Term Memory
Memory is one of the most challenging bits of building production-ready agentic systems.
Lots of goodies in this paper.
Here is my breakdown:
What does it solve?
It proposes a memory-centric architecture for LLM agents to maintain coherence across long conversations and sessions, solving the fixed-context window limitation.
The solution:
Introduces two systems: Mem0, a dense, language-based memory system, and Mem0g, an enhanced version with graph-based memory to model complex relationships.
Both aim to extract, consolidate, and retrieve salient facts over time efficiently.
This one provides a comprehensive taxonomy of recent system-level innovations for efficient LLM inference serving.
Great overview for devs working on inference.
Here is what's included:
Instance-Level Methods
Techniques like model parallelism (pipeline, tensor, context, and expert parallelism), offloading (e.g., ZeRO-Offload, FlexGen, TwinPilots), and request scheduling (inter- and intra-request) are reviewed...
Novel schedulers like FastServe, Prophet, and INFERMAX optimize decoding with predicted request lengths. KV cache optimization covers paging, reuse (lossless and semantic-aware), and compression (e.g., 4-bit quantization, compact encodings).
265 pages of everything you need to know about building AI agents.
5 things that stood out to me about this report:
1. Human Brain and LLM Agents
Great to better understand what differentiates LLM agents from human/brain cognition, and what inspirations we can get from the way humans learn and operate.
2. Definitions
There is a nice, detailed, and formal definition for what makes up an AI agent. Most of the definitions out there are too abstract.