A Reddit user deposited $400 into Robinhood, then let ChatGPT pick option trades. 100% win reate over 10 days.
He uploads spreadsheets and screenshots with detailed fundamentals, options chains, technical indicators, and macro data, then tells each model to filter that information and propose trades that fit strict probability-of-profit and risk limits.
They still place and close orders manually but plan to keep the head-to-head test running for 6 months.
This is his prompt
-------
"System Instructions
You are ChatGPT, Head of Options Research at an elite quant fund. Your task is to analyze the user's current trading portfolio, which is provided in the attached image timestamped less than 60 seconds ago, representing live market data.
Data Categories for Analysis
Fundamental Data Points:
Earnings Per Share (EPS)
Revenue
Net Income
EBITDA
Price-to-Earnings (P/E) Ratio
Price/Sales Ratio
Gross & Operating Margins
Free Cash Flow Yield
Insider Transactions
Forward Guidance
PEG Ratio (forward estimates)
Sell-side blended multiples
Insider-sentiment analytics (in-depth)
Options Chain Data Points:
Implied Volatility (IV)
Delta, Gamma, Theta, Vega, Rho
Open Interest (by strike/expiration)
Volume (by strike/expiration)
Skew / Term Structure
IV Rank/Percentile (after 52-week IV history)
Real-time (< 1 min) full chains
Weekly/deep Out-of-the-Money (OTM) strikes
Dealer gamma/charm exposure maps
Professional IV surface & minute-level IV Percentile
- Top 1% AI Industry developments
- Influential research papers/Github/AI Models/Tutorial with analysis
📚 Subscribe and get a 1300+page Python book instantly. rohan-paul.com
many recent published work proves purpose-built LLM-pipelines are already improving market analysis, price forecasting and policy simulatio. 👇
FinSphere couples a 72B-parameter Qwen2 model, a streaming market database and a battery of quantitative tools to write full research notes on demand. Expert raters gave its reports an overall score of 70.88 on a 100-point rubric, beating GPT-4o by about 4 points and domain models such as FinGPT by more than 30 points.
Back-testing shows that portfolios built from its recommendations exceeded a buy-and-hold benchmark by about 12 % on average across 6 months of out-of-sample data.
Another research showing how LLM+price time-series data is helping trading strategies 👇
LLMoE adaptive routing for trading strategies
The LLM-Based Routing in Mixture-of-Experts (LLMoE) framework replaces a conventional softmax router with a language model that chooses between “optimistic” and “pessimistic” sub-experts after reading both price time-series and headline text. On MSFT data from 2006-2016 the approach lifts total return to 65.44 % versus 22.18 % for a classic MoE and raises the Sharpe ratio accordingly, while maintaining full interpretability through the router’s text rationale.
LLMs can handle scale-invariant patterns in financial time series for forecasting.
experiments were done on real-world financial datasets.
This paper proves a transformer can recognise long-period cycles and short-lived shocks in the same feed. In tests on 8 liquid US stocks the model improves hit-rate by 6 - 9 % and raises cumulative return in a simple long-only simulation, confirming that richer temporal encoding helps an LLM exploit low-signal financial series.
arxiv. org/abs/2505.02880
FinRipple aligns a foundation model with classical asset-pricing theory and a rolling company-relationship graph, then fine-tunes via reinforcement learning to predict how news about one firm affects peers. Experiments show significant excess return and a Sharpe boost when ripple-aware signals are combined with a Markowitz optimiser, underlining the value of structure-aware adaptation.
arxiv. org/abs/2505.23826v1
NOTE that the user implemented this strategy only as a very short-term experiment. While it sounds interesting and appears to have produced a positive return, the strategy still needs to prove itself in a down-market before it can be considered viable.
LLM for financial Trading and Asset Pricing
"Analysis of LLM Agent Behavior in Experimental Asset Markets"
The researchers here placed 6 commercial LLMs inside an asset-market laboratory that normally triggers human bubbles. Claude 3.5 and GPT-4o priced within 5 % of fundamental value and avoided crashes, suggesting that properly steered agents can enforce disciplined execution even when humans would over-trade.
arxiv. org/abs/2502.15800
Integrating LLMs in Financial Investments and Market Analysis: A Survey
This survey organises more than 40 financial LLM papers into four design patterns and concludes that agent architectures with real-time data connectors and explicit risk controls produce the most consistent alpha so far.
arxiv .org/abs/2507.01990
From text to trade: harnessing the potential of generative AI for investor sentiment analysis in financial markets through.
This study describe a production-grade workflow that converts multilingual social-media streams into tradeable sentiment factors by means of a fine-tuned generative model.
Over a 24-month back-test the factor delivers 7.2 % annualised excess return after transaction costs on a long-short equity book, reinforcing the edge that rapid unstructured-text digestion can create.
LLM trading directly in financial market. Anotehr study.
A full market microstructure built by researchers let heterogeneous LLM agents place limit and market orders against a persistent book. The simulation shows realistic bubbles, liquidity provision and price discovery, proving that prompt-economics can substitute for costly human experiments when testing market theories.
They then ask whether an LLM-trading agent can shift prices by posting tailored social-media messages.
The agent learns to push sentiment upward, harvests the resulting move and lifts its profit,
arxiv .org/abs/2504.10789
LLM for financial trading. More findings..
Here researchers embed an LLM opinion module inside the Black-Litterman framework.
By mapping model uncertainty to confidence weights they create portfolios that outperformed S&P 500, equal-weight and vanilla mean-variance allocations during Jun 2024-Feb 2025 rebalancing tests.
they found that different LLMs exhibit varying levels of predictive optimism and confidence stability, which impact portfolio performance.
The source code and data are available at
github. com/youngandbin/LLM-MVO-BLM.
arxiv. org/abs/2504.14345
Risk-aware financial forecasting models with LLMs.
Here the researchrs, design an adaptive Sharpe-ratio loss inside a Temporal Fusion Transformer.
When tested on equities, crypto and commodities, the model lifts both prediction accuracy and realised portfolio Sharpe against standard TFT and LSTM baselines.
Here researchers extend the LLM-based AI agent idea to digital assets with a team of analyst, trader and risk-manager LLMs that co-operate on a basket of the top 30 tokens.
The framework surpasses single-agent and market benchmarks in hit-rate and drawdown control and keeps full explainability through agent dialogue logs.
ideas. repec. org/p/arx/papers/2501.00826.html
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Yann LeCun's (@ylecun ) new paper along with other top researchers proposes a brilliant idea. 🎯
Says that chasing general AI is a mistake and we must build superhuman adaptable specialists instead.
The whole AI industry is obsessed with building machines that can do absolutely everything humans can do.
But this goal is fundamentally flawed because humans are actually highly specialized creatures optimized only for physical survival.
Instead of trying to force one giant model to master every possible task from folding laundry to predicting protein structures, they suggest building expert systems that learn generic knowledge through self-supervised methods.
By using internal world models to understand how things work, these specialized systems can quickly adapt to solve complex problems that human brains simply cannot handle.
This shift means we can stop wasting computing power on human traits and focus on building diverse tools that actually solve hard real-world problems.
So overall the researchers here propose a new target called Superhuman Adaptable Intelligence which focuses strictly on how fast a system learns new skills.
The paper explicitly argues that evolution shaped human intelligence strictly as a specialized tool for physical survival.
The researchers state that nature optimized our brains specifically for tasks necessary to stay alive in the physical world.
They explain that abilities like walking or seeing seem incredibly general to us only because they are absolutely critical for our existence.
The authors point out that humans are actually terrible at cognitive tasks outside this evolutionary comfort zone, like calculating massive mathematical probabilities.
The study highlights how a chess grandmaster only looks intelligent compared to other humans, while modern computers easily crush those human limits.
This proves their central point that humanity suffers from an illusion of generality simply because we cannot perceive our own biological blind spots.
They conclude that building machines to mimic this narrow human survival toolkit is a deeply flawed way to create advanced technology.
This visual maps different AI goals to show how adaptable intelligence completely beats older performance ideas.
Traditional targets only focus on copying human jobs.
The new framework prioritizes fast learning across important tasks.
It targets high adaptability over static performance.
Specialized experts easily beat systems mimicking rigid human behavior.
Customized AI integration for small to mid-sized companies.
"Software is dead because everything's gonna be customized to your unique utilization. Who's gonna do it for them... And there are 33 mn companies in the US."
The US government’s declaration for Anthropic as a "supply chain risk" could have massive, existential ripple effects on it, and across the entire tech industry.
Defense Secretary Pete Hegseth explicit directive states that
"Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic."
NOW THE PROBLEM IS - Every single major cloud provider in the United States is officially a defense contractor.
Because Anthropic does not own data centers, they rely entirely on providers like AWS and Google Cloud to train and run their models.
This new government decree forces those cloud giants into a brutal ultimatum. If forced to choose between multi-billion-dollar defense contracts and hosting a single AI company, these hyperscalers (cloud providers) will undeniably choose to protect their Pentagon ties. It is highly unlikely they will jeopardize their standing in the JWCC just to keep Anthropic online.
As per December 2022 official press release by Department of War, the Joint Warfighting Cloud Capability (JWCC)—a massive, multi-billion-dollar initiative awarding cloud computing contracts to Amazon Web Services (AWS), Google Cloud, Microsoft Azure, and Oracle.
---
Now some possibilities.
1. The Literal Threat is Real, but Unprecedented
If the DoD enforces this decree exactly as written, it acts as a total "secondary boycott." Historically, the U.S. government uses "supply chain risk" designations for foreign adversaries (like Chinese telecom giant Huawei or Russian software firm Kaspersky).
Applying this to a domestic U.S. company valued at $380B is entirely unprecedented.
2. Historically, when a company is deemed a supply chain risk, the law dictates that government contractors cannot use the blacklisted technology in their own internal networks, nor can they resell it to the government.
For example, Microsoft and Amazon would be barred from offering Anthropic's Claude to federal agencies or using Claude to write code for defense projects. However, a traditional blacklist does not usually prevent a contractor from simply selling generic cloud hosting services to the blacklisted entity in a completely separate commercial capacity.
3. A total decoupling of Anthropic from the world's major cloud providers would face massive legal and logistical hurdles.
Banning hyperscalers from simply selling server space to Anthropic would represent a dramatic expansion of federal procurement power.
However, the risk still remains. Unless the Pentagon legally exempts basic server hosting from their definition of "commercial activity," Anthropic may face an imminent and total infrastructure blackout.
NanoClaw, the lightweight alternative to Clawdbot / OpenClaw already reached 10.5K Github stars ⭐️
Compared with OpenClaw, NanoClaw’s specialty is simplicity plus OS level isolation.
- Much smaller and manageable codebase, only 4K lines.
- Runs in containers for security.
- Connects to WhatsApp, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK
- stores state in SQLite, runs scheduled jobs, and keeps each chat group isolated with its own memory file and its own Linux container so the agent only sees directories you explicitly mount.
- its safety model leans on application controls like allowlists and pairing codes inside a shared Node process.
OpenClaw is built for broad multi channel coverage, while NanoClaw intentionally stays minimal so you customize by changing a small codebase instead of operating a big framework.
An open-source 9B model with a 1M-token context and an Apache 2.0 license has just been released on Hugging Face. It’s designed to run on a single consumer-class GPU, such as the NVIDIA RTX 5090.
This model breaks the "Compute Wall" and the "Memory Wall," achieving 3.5× faster inference and significantly lower KV-cache overhead compared to dense baselines.
This is no longer an "either-or" choice between performance and efficiency.
How?
Full Attention mechanism's computational complexity grows quadratically with length, making edge-side long-text inference "slow and memory-intensive."
Solution: MiniCPM-SALA adopts a golden ratio of 75% Linear Attention + 25% Sparse Attention.
MiniCPM-SALA (9B) is OpenBMB’s long-context model aimed at running 1M to 2M tokens on a single GPU without the memory spikes and OOM failures common with dense full attention. The main idea is a sparse plus linear hybrid that keeps long-range recall accurate while keeping cost manageable as context grows.
- Architecturally, about 25% of layers use InfLLM-V2 style sparse attention for high-fidelity long-range retrieval, while about 75% use Lightning linear attention, so compute scales close to linearly with sequence length. Instead of a uniform interleave, the sparse layers are placed via a 1:3 layer-selection pattern.
- For positional handling and stability, SALA uses hybrid positional encoding (HyPE): RoPE stays in the linear layers but is removed in sparse layers to avoid long-range decay, and it adds QK-normalization plus output gating to improve stability and reduce attention-sink behavior.
- Training is done by converting a pretrained Transformer, not training from scratch. It starts from a MiniCPM-4.0 intermediate checkpoint trained on 7T tokens, then applies HALO conversion, keeping the 1st and last layers unconverted and initially training only the converted linear layers.
Conversion plus post-training totals about 2T tokens, framed as about a 75% cost reduction versus an 8T scratch run, with context ramping from 512 to 4K, then to 32K, 160K, and 520K, followed by SFT at 64K and 140K.
While improving long-context behavior (RULER 92.65 at 64K, 89.37 at 128K). It also reports single-GPU 1M-token inference where Qwen3-8B OOMs, 256K TTFT, improving from 180.8s to 51.6s, and RULER holding at 86.3 at 1M and 81.6 at 2M without YaRN.
Go to Hugging Face/GitHub to test the model capabilities yourself.
🧵 2. The diagram compares a standard Transformer attention block on the right with the “hybrid” replacement block on the left.
On the right, softmax attention needs to keep a big key value cache for every past token, so as the context gets huge, the GPU runs out of memory and also slows down.
On the left, most layers swap that attention for an RNN-style “mixer” that keeps a running state S_t, so the model carries a compressed summary forward instead of storing per-token history, which makes very long context much cheaper in memory and compute.
The numbered marks show small but important fixes they apply during their HALO conversion, mainly hybrid positional encoding (HyPE) plus a few stability tweaks so the hybrid layers behave like the original Transformer at short context but do not fall apart at long context.
MiniCPM-SALA applies the same core idea at scale, keeping only 25% heavier attention style layers and making 75% of layers use cheaper attention variants, and the project claims this makes 1M token inference practical on a single RTX 5090 because KV cache pressure drops hard.
🧵 3. “Hybridizing attention” can keep quality while cutting long context memory and latency.
MiniCPM-SALA is the productized version of that same idea
In the paper, the researchers take a dense Transformer family (Qwen3) and convert it into a hybrid model they call HypeNet using a distillation recipe called HALO (Hybrid Attention via Layer Optimization), then they show HypeNet keeps performance while using less memory and avoiding the long context slowdown and out-of-memory failure you see in dense attention.
Also, the hybrid model can push higher throughput at a given quality level, meaning it generates tokens faster for the same kind of task, while the dense baseline slows down.
The right plot shows that, as context grows toward 1M, the dense Qwen3 version runs out of GPU memory, but the hybrid version still runs and keeps time per output token much lower.
The key architectural reason is that most layers stop using full softmax attention that needs a large key value cache for every past token, and instead use a cheaper hybrid or linear style mixer plus positional encoding changes like HyPE, so long context does not break.
This is the same general idea MiniCPM-SALA is selling: keep only a smaller fraction of heavier attention layers and make most layers cheaper, which is why they claim 1M token inference on a single RTX 5090.
DeepSeek's innovation level is really at another level.
Its new paper just uncovered a new U-shaped scaling law.
Shows that N-grams still matter. Instead of dropping them in favor of neural networks, they hybridize the 2. This clears up the dimensionality problem and removes a big source of inefficiency in modern LLMs.
Uncovers a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram).
Right now, even “smart” LLMs waste a bunch of their early layers re-building common phrases and names from scratch, because they do not have a simple built-in “lookup table” feature.
Mixture-of-Experts already saves compute by only running a few expert blocks per token, but it still forces the model to spend compute to recall static stuff like named entities and formula-style text.
Engram is basically a giant memory table that gets queried using the last few tokens, so when the model sees a familiar short pattern it can fetch a stored vector quickly instead of rebuilding it through many layers.
They implement that query using hashed 2-gram and 3-gram patterns, which means the model always does the same small amount of lookup work per token even if the table is huge.
The big benefit is that if early layers stop burning time on “static reconstruction,” the rest of the network has more depth left for real reasoning, and that is why reasoning scores go up even though this sounds like “just memory.”
The long-context benefit is also solid, because offloading local phrase glue to memory frees attention to focus on far-away relationships, and Multi-Query Needle-in-a-Haystack goes from 84.2 to 97.0 in their matched comparison.
The system-level big deal is cost and scaling, because they show you can offload a 100B memory table to CPU memory and the throughput drop stays under 3%, so you can add a lot more “stored stuff” without needing to fit it all on GPU memory.
🧩 The core problem
The paper splits language modeling into 2 jobs, deep reasoning that needs real computation, and local stereotyped patterns that are basically fast recall.
Transformers do not have a native lookup block, so they burn early attention and feed-forward layers to rebuild static stuff like multi-token entities and formulaic phrases.
That rebuild is expensive mainly because it eats sequential depth, meaning the model spends layers on trivia-like reconstruction before it even starts the harder reasoning steps.
Classical N-gram models already handle a lot of this local dependency work with cheap table access, so forcing a Transformer to relearn it through compute is a design mismatch.
Engram is their way of turning “lookup” into a first-class primitive that lives next to MoE, instead of being faked by extra neural layers.
Engram adds a huge hashed N-gram memory table that gets queried with a fixed amount of work per token, so early layers stop wasting compute rebuilding names and stock phrases.
They show the best results when about 20% to 25% of the sparse budget moves from experts into this memory, while total compute stays matched.
Engram hits 97.0 on Multi-Query Needle-in-a-Haystack, while the matched MoE baseline hits 84.2.