elvis Profile picture
Dec 10, 2019 13 tweets 5 min read Read on X
Machine learning for single cell biology: insights and challenges by Dana Pe’er. #NeurIPS2019 Image
The representation challenge Image
On visualizing and modeling the data ImageImageImage
On clustering single sell data Image
The challenge of inferring temporal progression of cell phenotype Image
An effort to map different cell types ImageImage
There exist many challenges on how to analyze cell data Image
Data harmonization is a critical challenge Image
Ways in how deep learning is used in cell understanding Image
👏 Image
Other challenges Image
Understanding response to therapy Image
Segmenting and analyzing cells is challenging ImageImageImageImage

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with elvis

elvis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @omarsar0

Jul 19
Context Rot

Great title for a report, but even better insights about how increasing input tokens impact the performance of top LLMs.

Banger report from Chroma.

Here are my takeaways (relevant for AI devs): Image
Context Rot

The research evaluates how state-of-the-art LLMs perform as input context length increases, challenging the common assumption that longer contexts are uniformly handled.

Testing 18 top models (including GPT-4.1, Claude 4, Gemini 2.5, Qwen3), the authors show that model reliability degrades non-uniformly even on simple tasks as input grows, what they term "context rot."Image
Simple tasks reveal degradation

Even basic benchmarks like semantic variants of Needle-in-a-Haystack, repeated word copying, or long QA logs (LongMemEval) expose accuracy drops as context length increases.

The decline is more dramatic for semantically ambiguous inputs or outputs that scale with length.Image
Read 8 tweets
Jul 18
A Survey of Context Engineering

160+ pages covering the most important research around context engineering for LLMs.

This is a must-read!

Here are my notes: Image
The paper provides a taxonomy of context engineering in LLMs categorized into foundational components, system implementations, evaluation methodologies, and future directions. Image
The context engineering evolution timeline from 2020 to 2025 involves foundational RAG systems to complex multi-agent architectures. Image
Read 12 tweets
Jul 17
Agent Leaderboard v2 is here!

> GPT-4.1 leads
> Gemini-2.5-flash excels at tool selection
> Kimi K2 is the top open-source model
> Grok 4 falls short
> Reasoning models lag behind
> No single model dominates all domains

More below: Image
@rungalileo introduces Agent Leaderboard v2, a domain-specific evaluation benchmark for AI agents designed to simulate real enterprise tasks across banking, healthcare, insurance, telecom, and investment. Image
Unlike earlier tool-calling benchmarks that saturate at 90%+ accuracy, v2 focuses on Action Completion (AC) and Tool Selection Quality (TSQ) in complex, multi-turn conversations. Image
Read 7 tweets
Jul 14
One Token to Fool LLM-as-a-Judge

Watch out for this one, devs!

Semantically empty tokens, like “Thought process:”, “Solution”, or even just a colon “:”, can consistently trick models into giving false positive rewards.

Here are my notes: Image
Overview

Investigates the surprising fragility of LLM-based reward models used in Reinforcement Learning with Verifiable Rewards (RLVR).

The authors find that inserting superficial, semantically empty tokens, like “Thought process:”, “Solution”, or even just a colon “:”, can consistently trick models into giving false positive rewards, regardless of the actual correctness of the response.Image
"Master keys" break LLM judges

Simple, generic lead-ins (e.g., “Let’s solve this step by step”) and even punctuation marks can elicit false YES judgments from top reward models.

This manipulation works across models (GPT-4o, Claude-4, Qwen2.5, etc.), tasks (math and general reasoning), and prompt formats, reaching up to 90% false positive rates in some cases.Image
Read 6 tweets
Jul 10
BREAKING: xAI announces Grok 4

"It can reason at a superhuman level!"

Here is everything you need to know: Image
Elon claims that Grok 4 is smarter than almost all grad students in all disciplines simultaneously.

100x more training than Grok 2.

10x more compute on RL than any of the models out there. Image
Performance on Humanity's Last Exam

Elon: "Grok 4 is post-grad level in everything!" Image
Read 21 tweets
Jul 8
MemAgent

MemAgent-14B is trained on 32K-length documents with an 8K context window.

Achieves >76% accuracy even at 3.5M tokens!

That consistency is crazy!

Here are my notes: Image
Overview

Introduces an RL–driven memory agent that enables transformer-based LLMs to handle documents up to 3.5 million tokens with near lossless performance, linear complexity, and no architectural modifications. Image
RL-shaped fixed-length memory

MemAgent reads documents in segments and maintains a fixed-size memory updated via an overwrite mechanism.

This lets it process arbitrarily long inputs with O(N) inference cost while avoiding context window overflows. Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(