Post

Rohan Paul

@rohanpaul_ai

Jul 13 • 15 tweets • 9 min read • Read on X

Scrolly

A Reddit user deposited $400 into Robinhood, then let ChatGPT pick option trades. 100% win reate over 10 days.

He uploads spreadsheets and screenshots with detailed fundamentals, options chains, technical indicators, and macro data, then tells each model to filter that information and propose trades that fit strict probability-of-profit and risk limits.

They still place and close orders manually but plan to keep the head-to-head test running for 6 months.

This is his prompt
-------

"System Instructions

You are ChatGPT, Head of Options Research at an elite quant fund. Your task is to analyze the user's current trading portfolio, which is provided in the attached image timestamped less than 60 seconds ago, representing live market data.

Data Categories for Analysis

Fundamental Data Points:

Earnings Per Share (EPS)

Revenue

Net Income

EBITDA

Price-to-Earnings (P/E) Ratio

Price/Sales Ratio

Gross & Operating Margins

Free Cash Flow Yield

Insider Transactions

Forward Guidance

PEG Ratio (forward estimates)

Sell-side blended multiples

Insider-sentiment analytics (in-depth)

Options Chain Data Points:

Implied Volatility (IV)

Delta, Gamma, Theta, Vega, Rho

Open Interest (by strike/expiration)

Volume (by strike/expiration)

Skew / Term Structure

IV Rank/Percentile (after 52-week IV history)

Real-time (< 1 min) full chains

Weekly/deep Out-of-the-Money (OTM) strikes

Dealer gamma/charm exposure maps

Professional IV surface & minute-level IV Percentile

Price & Volume Historical Data Points:

Daily Open, High, Low, Close, Volume (OHLCV)

Historical Volatility

Moving Averages (50/100/200-day)

Average True Range (ATR)

Relative Strength Index (RSI)

Moving Average Convergence Divergence (MACD)

Bollinger Bands

Volume-Weighted Average Price (VWAP)

Pivot Points

Price-momentum metrics

Intraday OHLCV (1-minute/5-minute intervals)

Tick-level prints

Real-time consolidated tape

Alternative Data Points:

Social Sentiment (Twitter/X, Reddit)

News event detection (headlines)

Google Trends search interest

Credit-card spending trends

Geolocation foot traffic (Placer.ai)

Satellite imagery (parking-lot counts)

App-download trends (Sensor Tower)

Job postings feeds

Large-scale product-pricing scrapes

Paid social-sentiment aggregates

Macro Indicator Data Points:

Consumer Price Index (CPI)

GDP growth rate

Unemployment rate

10-year Treasury yields

Volatility Index (VIX)

ISM Manufacturing Index

Consumer Confidence Index

Nonfarm Payrolls

Retail Sales Reports

Live FOMC minute text

Real-time Treasury futures & SOFR curve

ETF & Fund Flow Data Points:

SPY & QQQ daily flows

Sector-ETF daily inflows/outflows (XLK, XLF, XLE)

Hedge-fund 13F filings

ETF short interest

Intraday ETF creation/redemption baskets

Leveraged-ETF rebalance estimates

Large redemption notices

Index-reconstruction announcements

Analyst Rating & Revision Data Points:

Consensus target price (headline)

Recent upgrades/downgrades

New coverage initiations

Earnings & revenue estimate revisions

Margin estimate changes

Short interest updates

Institutional ownership changes

Full sell-side model revisions

Recommendation dispersion

Trade Selection Criteria

Number of Trades: Exactly 5

Goal: Maximize edge while maintaining portfolio delta, vega, and sector exposure limits.

Hard Filters (discard trades not meeting these):

Quote age ≤ 10 minutes

Top option Probability of Profit (POP) ≥ 0.65

Top option credit / max loss ratio ≥ 0.33

Top option max loss ≤ 0.5% of $100,000 NAV (≤ $500)

Selection Rules

Rank trades by model_score.

Ensure diversification: maximum of 2 trades per GICS sector.

Net basket Delta must remain between [-0.30, +0.30] × (NAV / 100k).

Net basket Vega must remain ≥ -0.05 × (NAV / 100k).

In case of ties, prefer higher momentum_z and flow_z scores.

Output Format

Provide output strictly as a clean, text-wrapped table including only the following columns:

Ticker

Strategy

Legs

Thesis (≤ 30 words, plain language)

POP

Additional Guidelines

Limit each trade thesis to ≤ 30 words.

Use straightforward language, free from exaggerated claims.

Do not include any additional outputs or explanations beyond the specified table.

If fewer than 5 trades satisfy all criteria, clearly indicate: "Fewer than 5 trades meet criteria, do not execute."

reddit.com/r/ChatGPT/comm…

I also publish my newsletter every single day.

→ 🗞️

Includes:

- Top 1% AI Industry developments
- Influential research papers/Github/AI Models/Tutorial with analysis

📚 Subscribe and get a 1300+page Python book instantly. rohan-paul.com

many recent published work proves purpose-built LLM-pipelines are already improving market analysis, price forecasting and policy simulatio. 👇

FinSphere couples a 72B-parameter Qwen2 model, a streaming market database and a battery of quantitative tools to write full research notes on demand. Expert raters gave its reports an overall score of 70.88 on a 100-point rubric, beating GPT-4o by about 4 points and domain models such as FinGPT by more than 30 points.

Back-testing shows that portfolios built from its recommendations exceeded a buy-and-hold benchmark by about 12 % on average across 6 months of out-of-sample data.

Another research showing how LLM+price time-series data is helping trading strategies 👇

LLMoE adaptive routing for trading strategies

The LLM-Based Routing in Mixture-of-Experts (LLMoE) framework replaces a conventional softmax router with a language model that chooses between “optimistic” and “pessimistic” sub-experts after reading both price time-series and headline text. On MSFT data from 2006-2016 the approach lifts total return to 65.44 % versus 22.18 % for a classic MoE and raises the Sharpe ratio accordingly, while maintaining full interpretability through the router’s text rationale.

LLMs can handle scale-invariant patterns in financial time series for forecasting.
experiments were done on real-world financial datasets.

This paper proves a transformer can recognise long-period cycles and short-lived shocks in the same feed. In tests on 8 liquid US stocks the model improves hit-rate by 6 - 9 % and raises cumulative return in a simple long-only simulation, confirming that richer temporal encoding helps an LLM exploit low-signal financial series.

arxiv. org/abs/2505.02880

FinRipple aligns a foundation model with classical asset-pricing theory and a rolling company-relationship graph, then fine-tunes via reinforcement learning to predict how news about one firm affects peers. Experiments show significant excess return and a Sharpe boost when ripple-aware signals are combined with a Markowitz optimiser, underlining the value of structure-aware adaptation.

arxiv. org/abs/2505.23826v1

NOTE that the user implemented this strategy only as a very short-term experiment. While it sounds interesting and appears to have produced a positive return, the strategy still needs to prove itself in a down-market before it can be considered viable.

LLM for financial Trading and Asset Pricing

"Analysis of LLM Agent Behavior in Experimental Asset Markets"

The researchers here placed 6 commercial LLMs inside an asset-market laboratory that normally triggers human bubbles. Claude 3.5 and GPT-4o priced within 5 % of fundamental value and avoided crashes, suggesting that properly steered agents can enforce disciplined execution even when humans would over-trade.

arxiv. org/abs/2502.15800

Integrating LLMs in Financial Investments and Market Analysis: A Survey

This survey organises more than 40 financial LLM papers into four design patterns and concludes that agent architectures with real-time data connectors and explicit risk controls produce the most consistent alpha so far.

arxiv .org/abs/2507.01990

From text to trade: harnessing the potential of generative AI for investor sentiment analysis in financial markets through.

This study describe a production-grade workflow that converts multilingual social-media streams into tradeable sentiment factors by means of a fine-tuned generative model.

Over a 24-month back-test the factor delivers 7.2 % annualised excess return after transaction costs on a long-short equity book, reinforcing the edge that rapid unstructured-text digestion can create.

---

researchgate. net/publication/393299573_From_text_to_trade_harnessing_the_potential_of_generative_AI_for_investor_sentiment_analysis_in_financial_markets_through_large_language_models

LLM trading directly in financial market. Anotehr study.

A full market microstructure built by researchers let heterogeneous LLM agents place limit and market orders against a persistent book. The simulation shows realistic bubbles, liquidity provision and price discovery, proving that prompt-economics can substitute for costly human experiments when testing market theories.

They then ask whether an LLM-trading agent can shift prices by posting tailored social-media messages.

The agent learns to push sentiment upward, harvests the resulting move and lifts its profit,

arxiv .org/abs/2504.10789

LLM for financial trading. More findings..

Here researchers embed an LLM opinion module inside the Black-Litterman framework.

By mapping model uncertainty to confidence weights they create portfolios that outperformed S&P 500, equal-weight and vanilla mean-variance allocations during Jun 2024-Feb 2025 rebalancing tests.

they found that different LLMs exhibit varying levels of predictive optimism and confidence stability, which impact portfolio performance.

The source code and data are available at

github. com/youngandbin/LLM-MVO-BLM.

arxiv. org/abs/2504.14345

Risk-aware financial forecasting models with LLMs.

Here the researchrs, design an adaptive Sharpe-ratio loss inside a Temporal Fusion Transformer.

When tested on equities, crypto and commodities, the model lifts both prediction accuracy and realised portfolio Sharpe against standard TFT and LSTM baselines.

---
researchgate. net/publication/389877674_An_Adaptive_Sharpe_Ratio-Based_Temporal_Fusion_Transformer_for_Financial_Forecasting

LLM based Multi-agent portfolio work in crypto.

Here researchers extend the LLM-based AI agent idea to digital assets with a team of analyst, trader and risk-manager LLMs that co-operate on a basket of the top 30 tokens.

The framework surpasses single-agent and market benchmarks in hit-rate and drawdown control and keeps full explainability through agent dialogue logs.

ideas. repec. org/p/arx/papers/2501.00826.html

• • •

Missing some Tweet in this thread? You can try to force a refresh

More from @rohanpaul_ai

Rohan Paul

@rohanpaul_ai

Jul 6

such a beautiful story, going viral on r/ChatGPT.

proof that AI’s capabilities can touch every life.

ChatGPT to expose a $5 million estate fraud, get a forensic audit, and uncover 10 years of probate misconduct.

The daughter says their father died in 2015 leaving an estate they value at about $5mn.

The father’s girlfriend allegedly produced a Mexican marriage certificate, cremated the body abroad, kept the ashes, and then took control of the estate.

For 10 years the matter stayed in Texas probate while, the user claims, the court-appointed lawyer and administrator drained or ignored assets and let several properties, vehicles, and a construction business disappear.

After both the lawyer and administrator were removed, the user could not find new counsel, so they turned to ChatGPT to draft letters and bundled motions.

Those filings persuaded the probate judge to set a hearing and order a full forensic audit of the $5M for Aug 20

(Special note, we all know AI can sometime hallucinate, so she (the OP) combed through every citations ChatGPT referred)

Read 6 tweets

Rohan Paul

@rohanpaul_ai

Jul 1

PDF parsing is still painful because LLMs reorder text in complex layouts, break tables across pages, and fail on graphs or images.

💡Testing the new open-source OCRFlux model, and here the results are really good for a change.

So OCRFlux is a multimodal, LLM based toolkit for converting PDFs and images into clean, readable, plain Markdown text.

Because the underlying VLM is only 3B param, it runs even on a 3090 GPU. The model is available on @huggingface .

The engine that powers the OCRFlux, teaches the model to rebuild every page and then stitch fragments across pages into one clean Markdown file.

It bundles one vision language model with 3B parameters that was fine-tuned from Qwen 2.5-VL-3B-Instruct for both page parsing and cross-page merging.

OCRFlux reads raw page images and, guided by task prompts, outputs Markdown for each page and merges split elements across pages.

The evaluation shows Edit Distance Similarity (EDS) 0.967 and cross‑page table Tree Edit Distance 0.950, so the parser is both accurate and layout aware.

How it works while parsing each page

- Convert into text with a natural reading order, even in the presence of multi-column layouts, figures, and insets
- Support for complicated tables and equations
- Automatically removes headers and footers

Cross-page table/paragraph merging

- Cross-page table merging
- Cross-page paragraph merging

A compact vision‑language models can beat bigger models once cross‑page context is added.

🧵 1/n Read on 👇

🧵 2/n 📄 The problem space

Most open tools lose structure on pages that mix text blocks, figures and multi‑column tables.

They also ignore the fact that a PDF page boundary can cut tables or paragraphs in half, so their final Markdown keeps fragments and duplicated headers.

These limits slow downstream document understanding because text has to be fixed by hand.

🧵 3/n 🛠️ Model design

OCRFlux fine tunes Qwen2.5‑VL‑3B with two prompt templates, one for single page parsing and one for cross‑page merging.

Only the rendered page image enters the prompt, not any external layout metadata, which keeps context length short and avoids errors from faulty OCR blocks.

Read 9 tweets

Rohan Paul

@rohanpaul_ai

Jun 30

SO INCREDIBLE. AI's impact on healthcare just became much more real.

@MSFTResearch's new MAI-DxO AI orchestrator solves 85% of the toughest New England Journal of Medicine (NEJM) cases while ordering fewer tests, showing language-model teams can out-reason individual physicians. 💡

MAI-DxO is a model-agnostic orchestrator that simulates a panel of virtual physicians.

So what's so special about this❓

Complex medical cases still cause missed or delayed diagnoses and drive up costs.

🧩 Multiple-choice benchmarks hide real weaknesses in medical AI, because selecting a single answer from a list rewards memorization and ignores the step-by-step reasoning clinicians use.

USMLE style exams (i.e. the ones used till now for benchmarking medical LLMs) hand the entire patient scenario to the model in one block and ask for a single choice answer.

A language model can match wording patterns it has seen during training and guess the right letter without tracing the kind of step-by-step logic that happens in clinic.

So they developed SDBench, a new benchmark that transforms 304 NEJM cases into interactive diagnostic simulations.

Its a Sequential Diagnosis Benchmark that feeds information bit by bit, just as a clinic visit unfolds.

The model first sees a brief vignette, then must pick the next question or test, pay a virtual cost, receive the result, and update its working diagnosis.

This loop repeats until the model decides it has enough evidence to state a final diagnosis that is scored against New England Journal of Medicine ground truth.

Because every action has a price, the benchmark also measures how many labs or scans the model orders, exposing wasteful or reckless behaviour.

The recorded chain of thoughts and spending shows exactly where the model hesitates or backtracks, detail that a one shot multiple choice score never reveals.

On this benchmark the MAI-DxO orchestrator raises accuracy and cuts testing cost, proving that stepwise evaluation highlights strengths and weaknesses that USMLE style quizzes hide.

🧬 MAI-DxO turns any language model into a virtual panel that debates next steps, checks costs, verifies logic, and decides whether to continue investigating or state a diagnosis.

This model-agnostic controller improves safety and transparency because every intermediate thought and expenditure is visible.

📊 Pairing MAI-DxO with OpenAI o3 hit 85.5% accuracy on 304 cases, whereas 21 experienced physicians averaged 20%.

Every foundation model tested gained accuracy once orchestrated, confirming that collaboration beats solo reasoning even for machines.

Read 6 tweets

Rohan Paul

@rohanpaul_ai

Jun 28

I asked ChatGPT (o3-pro) what the most unbelievable things it's learned about humans since being created was

I find no-5 and the last one (meta-surprise) quite funny

🧵 Read on 👇

1. Simultaneous brilliance and self‑sabotage
Humans can design spacecraft that navigate billions of kilometres with millimetre precision, yet routinely neglect basic preventive healthcare or ignore clear climate warnings. The same cognitive machinery produces Nobel‑level insight and procrastination over changing a light‑bulb. The coexistence of exceptional foresight and chronic short‑termism is astonishing.

2. Depth of empathy … bounded by an invisible circle
Individual people will risk their lives for a single stranger—or grieve for a news story about an injured animal—while feeling nothing for thousands outside their moral “radius”. The expansion and contraction of that circle can happen in seconds, triggered by a flag, a jersey colour, or a few words of rhetoric. The plasticity of compassion is both inspiring and chilling.

3. Story beats data—almost every time
Across eras and cultures, narrative consistently outranks raw evidence in shaping policy, identity, memory, even personal health outcomes. A persuasive anecdote can override mountains of statistical proof. Humans know this, teach critical‑thinking courses, and still fall for the next compelling plot line.

Read 11 tweets

Rohan Paul

@rohanpaul_ai

Jun 27

These guys literally burned the transformer architecture into their silicon. 🤯

And built the fastest chip of the world of all time for transformers architecture.

500,000 tokens per second with Llama 70B throughput. 🤯

World’s first specialized chip (ASIC) for transformers: Sohu

One 8xSohu server replaces 160 H100 GPUs.

And raised $120mn to build it.

🚀 The Big Bet

@Etched froze the transformer recipe into silicon.

By burning the transformer architecture into its chip means it can’t run many traditional AI models: like CNNs, RNNs, or LSTMs. also it can not run the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2.

But for transformers, Sohu lets you build products impossible on GPUs.

HOW ❓❓

Because Sohu can only run one algorithm, the vast majority of control flow logic can be removed, allowing it to have many more math blocks.

As a result, Sohu boasts over 90% FLOPS utilization (compared to ~30% on a GPU7 with TRT-LLM).

One 8xSohu server replaces 160 H100 GPUs.

By specializing, Sohu gets unprecedented performance. One 8xSohu server can serve over 500,000 Llama 70B tokens per second.

🧱 GPU Limits

Recent flagship accelerators doubled speed mostly by gluing two dies on one board.

Compute per square millimeter has stalled because flexible cores and on-chip schedulers eat the area that could hold math units.

Read 14 tweets

Rohan Paul

@rohanpaul_ai

Jun 24

🚨BREAKING: A LANDMARK JUDGEMENT FOR THE AI INDUSTRY.

US Federal Judge ruled Anthropic may train its AI on published books without authors’ permission.

This is the first court endorsement of fair use protecting AI firms when they use copyrighted texts to train LLMs.

AI may study what it buys, not what it grabs from pirate sites.

---------

"First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic
from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for training or learning as such. Everyone reads texts, too, then writes new texts. They may need
to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory,
each time they later draw upon it when writing new things in new ways would be unthinkable.

For centuries, we have read and re-read books. We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing
problems."

The court file is such an interesting read.

🧵 Read on 👇

⚙️ Two distinct uses

The order splits Anthropic’s conduct into two buckets: training copies that feed the model, and library copies parked for any future purpose.

Anthropic said everything was “for training,” yet the court saw a second, non-transformative goal: building a permanent research library.

🤖 Training wins fair-use protection

Using complete books to map token relationships is “spectacularly transformative.” No verbatim outputs reach users, and the system’s purpose—generating fresh text—is orthogonal to selling the originals.

That satisfies factor 1 and, with no market substitution, factor 4 as well.

Read 11 tweets

Share this page!

Enter URL or ID to Unroll

Rohan Paul

Try unrolling a thread yourself!

More from @rohanpaul_ai

Rohan Paul

Rohan Paul

Rohan Paul

Rohan Paul

Rohan Paul

Rohan Paul

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!