How to get URL link on X (Twitter) App
1. QeRL builds two RL algorithms for LLMs:
1. TRM is built on the idea of the Hierarchical Reasoning Model (HRM).
RoT works by:
1. Intern-s1: A scientific multimodal foundation model by Shanghai AI Lab (open-source)
1. Sotopia-RL: Reward Design for Social Intelligence

1. Workflow of SingLoRA:


1. OctoThinker
1. Aider @aider_chat
1. In CoE:
1. @Google introduced Gemini 2.5 Flash and Pro as stable and production-ready, and launched Gemini 2.5 Flash-Lite in preview – the fastest and most cost-efficient.


1. Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability


1. Hugging Face insists, “Bigger isn’t better” https://twitter.com/186420551/status/1934672721066991908


1. Self-Challenging Language Model Agents by @AIatMeta, @UCBerkeley
1. Input:
1. Where is AI going these days?
1. HRPO uses reinforcement learning (RL) to train LLMs to reason internally without needing CoT training data.
Architecture:
1. Past milestones and directions
1. Agents as first-class business & M365 entities:

1. Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
1. Multi-head Latent Attention (MLA)