1. @Google introduced Gemini 2.5 Flash and Pro as stable and production-ready, and launched Gemini 2.5 Flash-Lite in preview – the fastest and most cost-efficient.
Flash-Lite outperforms 2.0 Flash-Lite in coding, math, science, reasoning, and multimodal benchmarks. It features lower latency, supports 1 million-token context, multimodal input, and connects to tools like Google Search and code execution
It's a 72.7B-parameter open-source coding LLM fine-tuned from Qwen2.5-72B. Sets a new SOTA on SWE-bench Verified with 60.4% accuracy. Optimizes with large-scale RL to fix real GitHub Docker issues, rewarded only when full test suites pass.
Available on Hugging Face and GitHub
4. @AnthropicAI, @scale_AI, and @redwood_ai developed SHADE-Arena, a suite of 17 complex evaluations testing if LLMs can secretly complete sabotage tasks alongside benign ones.
Models needed to complete tasks and avoid AI detection. None had over 30% success; evasion topped at ~60%. Claude Sonnet 3.7 better concealed thoughts. Gemini 2.5 Pro surpassed humans but had many false positives.
▪️ Institutional Books 1.0 - a 242B token dataset
▪️ o3-pro from @OpenAI
▪️ FGN from @GoogleDeepMind
▪️ Magistral by @MistralAI
▪️ Resa: Transparent Reasoning Models via SAEs
▪️ Multiverse (Carnegie+NVIDIA)
▪️ Ming-Omni
▪️ Seedance 1.0 by ByteDance
▪️ Sentinel
🧵
1. Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability
Sourced from 1,075,899 scanned books across 250+ languages via the Google Books project, the dataset includes both raw and post-processed text and detailed metadata.
A high-reliability LLM for math, science, and coding. It beats o1-pro and o3 in expert tests for clarity, instruction-following, and accuracy. Includes includes tool access (web search, code execution, vision) but responds slower.
Replaces o1-pro for Pro/Team users (they also drop the price of o3 by 80%).
▪️ @HuggingFace helps to find the best model based on size
▪️ NVIDIA’s Jensen Huang and @ylecun disagree with Anthropic’s Dario Amodei predictions
▪️ @AIatMeta’s Superintelligence Gambit
▪️ @Google adds a voice to Search
▪️ Mattel and @OpenAI: brains to Barbie
▪️ Projects in ChatGPT
2. @Nvidia’s Jensen Huang: “I disagree with almost everything he says”
At VivaTech in Paris, he took aim at Anthropic’s Dario Amodei, scoffing at his dire predictions about AI replacing half of entry-level jobs.
Huang argues for open, responsible development – not “dark room” AI monopolies. @ylecun agrees 👇
.@JeffDean interview at @Sequoia’s AI Ascent is a must-watch. He provides a real look at where AI is headed, what’s actually happening in the field, sharing insights on:
• Specialized hardware
• Evolution of models
• Future of computing infrastructure
• AI's role in science and more
Here are the key takeaways:
1. Where is AI going these days?
Models are improving fast and solving more problems each year. Hardware, training algorithms, and RL techniques have brought us here — and multimodal is a big focus for what’s next.
2. What about agents?
Jeff Dean sees huge potential in both virtual and robotic agents. With more training and experience, we’ll soon see them doing ~20 useful real-world tasks — unlocking a cycle of usefulness, cost reduction, and further improvements