Keone Hon ⨀ Profile picture
Dec 22 2 tweets 4 min read Read on X
There is an open source "real-world AI hedge fund" github repo going around twitter that's getting a lot of interest.

Since in my previous life I was using ML for trading, I wanted to share some notes on the repo and how one would actually create a "real-world AI hedge fund".

First, about the repo:

The core logic of the repo is about 750 lines of python. The 6 "agents" are each about 100 lines; each one is computing some very explicitly-defined metric that is something like "count the number of insider trades, if there are more buys than sells, return 1, else return -1".

The main use of "AI" in this repo is to spin up gpt-4o from OpenAI and present it with the following prompt (I'm abbreviating to give the gist, but see line 569 of agents . py in the repo):

"""
You are a portfolio manager. Your job is to make a trading decision based on the team's analysis while strictly adhering to risk management constraints.

Quant Analysis signal: {some number from say 1 to 5 which was calculated by one of the aforementioned 'agents'}

Fundamental Analysis signal: {'bullish' or 'bearish'}

Please output whether to 'buy', 'sell', or 'hold'.
"""

Then there is a backtester which converts the 'buy', 'sell', or 'hold' actions into actual trades using daily (end-of-day price) data.

What's wrong with this

Quant finance is the practice of generating measurements ("signals") that are relevant to predicting future price action; combining those signals somehow into a more complicated model that actually predicts the future price action; and trading on those predictions.

A quant at a quant trading firm will typically spend a good portion of their time thinking up new trading signals that might have predictive power:

- For example, a quant who is trying to predict the price of AAPL might notice that AAPL and MSFT are pretty correlated, and write a signal to measure how much the price of MSFT has changed recently relative to AAPL to measure some notion of price change that hasn't been priced in.

- Another quant might hypothesize that whenver there are way more shares on the bid side than on the offer side, the price is more likely to go up, and write a signal to encode that particular metric.

There could be thousands+ of these signals, especially when you consider that some of these signals have parameters which should be searched over.

After devising all of these signals, the quant needs to combine them somehow into a model that predicts the future. This means setting up a prediction problem, which means (at least) 4 things:

1. Choosing a set of signals to pull;

2. Choosing a set of sampling points (for example, every hour, every second, every time there is a trade, etc);

3. Choosing a ground truth (or "target") to try to predict (typically the delta between AAPL's price at the sampling point and at some future point in time, e.g. 1 second into the future); and

4. Choosing a model for combining the signals.

One of the reasons that AI/ML is very applicable to finance is because advanced models can be good at capturing nonlinear relationships between signals, provided that we can train the models with a lot of data (part 2 above) and have a good ground truth (part 3).

Prediction problems that effectively use an AI/ML model usually involve large datasets with millions of rows and hundreds or thousands of columns.

If it helps, imagine putting yourself in the shoes of the computer. At each row in the dataset, you are presented with all of the values of all of the signals, along with information about what ended up happening to the price. Your job as the computer is to figure out how to utilize those signals to predict the outcome with great accuracy. Thankfully, we have amazing algorithms like feed-forward neural nets which do a good job of memorizing relationships between the signals whose data is being presented and making predictions that do a good job of matching ground truth.

The power of Machine Learning is in learning what to do from a large, well-curated dataset with a well-chosen target.

Now back to the repo

The repo constructs a couple of signals with very low granularity (basically just "bullish" or "bearish")

The repo doesn't actually train an AI model to figure out how useful (or useless) the signals are at making a prediction of future price action

The repo instead asks ChatGPT "what would you do if your quant told you that MACD was bearish but RSI was bullish, would you 'buy', 'sell', or 'hold'?"

There is no learning here. There is no utilization of the actually amazing powers of AI to combine signals into a prediction. There is no feedback loop to even tell the model that it did a good or bad job.

(The backtest also has a lot of issues but I won't get into that.)

If you are interested in the subject, I would recommend playing with "Stock Market Analysis + Prediction Using LSTM" on Kaggle.

Although roughly equally simple in terms of the number of signals and granularity of sampling, that Kaggle notebook actually shows the process of constructing a dataset and target, and training an actual ML model. If you try adding more signals, or training over more interesting sampling points, or thinking about what target you should actually be predicting, you will end up gradually getting a better model and learning a lot about quant trading in the process :)
Here's the Kaggle notebook I mentioned:
kaggle.com/code/faressaya…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Keone Hon ⨀

Keone Hon ⨀ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @keoneHD

Jan 2, 2023
a hierarchy of metrics:
1. net profit (the E in P/E ratio)
2. gross profit (profit before OpEx aka salaries etc)
3. protocol revenue (portion that does or can accrue to tokenholders) net of incentives
4. revenue (protocol revenue + cost of revenue aka LP yield)
5. volume
6. TVL
here is how analysis on protocols can be improved:

A1. be aware that 6/5/4 are easily gamed, and when making comparisons across protocols, endeavor to measure real protocol revenue net of incentives (3)

this might require looking at an unincentivized period or segment
A1 (cont) gamed how?

TVL - gamed by leverage (recursive borrowing) and double-counting from composability - most famously by Dylan&Ian

volume - gamed by wash trading, but also by effectively-negative fees (LOOKS, DYDX, many new platforms)
Read 5 tweets
Dec 3, 2022
Eth net supply growth does NOT mean the network's value accrual (i.e. fee burn minus outlays) is 'unprofitable' for tokenholders in aggregate

It just means it's 'unprofitable' (very slightly) for unstaked Eth holders

Quick thread to elaborate:
Fee burn (post EIP-1559) is a value accrual method. Supply goes down + marketcap flat -> price goes up

Inflation from staking would seem like the opposite effect, but it's mostly not: 2/6
Staking-based inflation results in value transfer from nonstakers to stakers. The extreme was OHM with 7000+% APY to stakers but the same principle applies to Eth

Inflation is a 'cost' to nonstakers but a source of income to stakers. To holders in aggregate it's *mostly* 0

3/6
Read 7 tweets
Sep 26, 2022
Lots of pessimism about crypto right now

Price reflexivity & overindexing on the present make us overly reactive both ways. Overly euphoric in good times, overly pessimistic now

We should reason out what's powerful about on-chain apps, predict the future & act accordingly 🔮 1/
I'm optimistic about crypto & extremely confident that decentralized computation will better everyday people's lives

💪

Why? Bc it allows devs to stack innovation, building up sophisticated apps from simple primitives & open APIs. Speed of innovation is what matters

2/
Smart contract blockchains enable trustless shared global state

They enable a sandbox w a single global namespace in which many apps & assets coexist on one machine

Developers can build new apps that reuse existing apps' fns as subroutines using atomic execution by the VM
🏗️
3/
Read 26 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(