StatArb Profile picture
Mar 8 16 tweets 3 min read
For anyone making an HFT strat, you need a simulator to see when you get filled. You can use Kalman queues, multi-queues, sim matching engines and they are all cool but usually don't properly capture effects like adversity. Then there are stochastic approaches...
1/n
This would be like simulating the poisson process. This sucks even worse as it utterly and entirely ignores adversity. At least sim matchers give a try, although not a great one and an easy NN will trounce it. Plus stochastic approaches don't use historical data...
2/n
Other methods involve procedural OB sims which basically just take the data and add a bit of stochastic overlay to the historical OB, this assumes (most of the time) that you get filled when midprice crosses your bid/ask which is just wrong because it neglects...
3/n
bounce (bids and asks getting executed without crossing the midprice). The best approach is the simple ML approach. This is where we use a machine learning model to predict whether you will get filled based on the data without assumptions at all. There are ways to improve...
4/n
This with stochastic methods to regulate hetroskedasticity and kurtosis in your fill SD that usually won't get picked up by an ML algorithm, but we will ignore that for now since it's a bit of work for not much extra performance. Anyways back on topic...
5/n
We first start by labeling our data. What are we even predicting? One simple approach is to just label the time until an order gets filled, and from there you can infer the conditional volatility for stochastic sims after the fact. For this, you need some LOB data and some...
6/n
trade data. This can be acquired if you get approved for it from Binance for no costs (may want to be VIP or they will ignore your email), or just ask me and I'll load some in a drive for you. Now that we know what we are predicting what are our features? The first...
7/n
and largest driver of variance will simply be offset, and as many familiar with basic HFT theory volatility dependent offset plus a bit to do with capitalization and tick sizes usually determines bid-ask spread. The volatility-dependent offset model is most appropriate...
8/n
when dealing with cryptocurrencies, forex and equities have different microstructural considerations (forex is OTC, and equities have VERY different behaviors between large and small caps). Small caps are more multi-fractal, but large caps are strangely jump stochastic...
9/n
Back on topic. We engineer offset from midprice (obviously), but also microprice. This is just a simple prediction of the midprice 1 tick from now, you can do this again for 2 ticks as well. Weighted midprice is good as well, but for reasons I won't discuss inadequate...
10/n
to be used as a primary feature. Basically, you get the convexity you should really when factoring liquidity with micro price, and you get a diagonal with weighted (liquidity leaving doesn't mean fair value always changes), and obv flat with normal midprice...
11/n
Next is skewness of the orderbook, this is standard and is usual for books. You can also use the time of day. There are about 5-6 cycles of depth in a sin wave like form for illiquid cryptos, slightly smoother for liquids. This is stationary so you can use FFT since you...
12/n
only need frequency, not time unlike DWT would give you. Now we have the most important factor to decide your model's success live -- Adversity. This is just the fact that you will get filled way more when you have no alpha. Nobody wants to make markets for an arbitrageur...
13/n
So you need a fast model (unless you have a million-dollar compute budget) and train it on multiple timeframes to predict. Different OB levels will favor different term predictions. Then finally include microstructural models like VPIN and lambdas that will also pick...
14/n
up on adversity to some degree, although not very well. For those who are beginners, rolling STD is not the only volatility and OB based volatility metrics are great and just other volatility metrics that aren't even OB based. There are also models to use OHLC or tick...
15/n
data to predict the bid ask spread or the skew. Predicted skew or BAS are great features. That's all on simulating LOs and how they get filled. This is a pretty basic view of the main part of what I use, but hopefully, it is insightful. Share your approaches!
16/16

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with StatArb

StatArb Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TerribleQuant

Mar 10
Lag error:

I’ve totally made up this word but there’s not word for the concept it captures and I’ve been using it for over a year regardless.

If you use bollinger bands to trade mean reverting portfolios your lag error is the loss of alpha from the deterministic component.
This comes in 3 forms:

Jump risk:
Large jumps in the mean will take time for your mean to move to and cause errors because moving averages are lagged. This is a regime shifting ish problem and is aided by unsupervised learning models with conservatism controls.

The next is mismatched period:

If there is a sin wave with white noise we may attempt to use an MA to trade the noise part. This will give us lag error as we will not be accounting for the broader sim function and get lag error, hurting our PnL. Mismatched timeframe

Image
Read 6 tweets
Mar 3
Multithreading is not an instant speed booster:

A word on these three:

Async I/O
Multi-threaded CPU tasks
Hyper-threaded CPU tasks
Async I/O will only benefit speed-wise from the writing data component of the work and can slow you down if that is not significant. There is only one NAC remember, but there is not infinite cache and management is expensive.

1/3
Multi-threading is great, but not for file handling, that won't be faster and will split your cache once again so for ultra low latency applications you usually just have one super fast core enabled so you can maximise cache.

2/3
Read 4 tweets
Mar 3
What is really interesting is the high-level mathematics of neural networks. Ignoring activation functions, I mean shape. Two groundbreaking discoveries:

1) Inference is (non linearly) proportional to the NN depth and so is compute time.
2) Infinitely wide NN is equal to an SVM
1000 transistors are needed to multiply a number with digital systems. This just needs a resistor for analog signals. We already know NNs can be inaccurate and still work fine hence special float formats for them. Analog systems are the way forward as transistors hit the size...
of an atom. Nowadays we just pack more on a chip but densities haven't changed much. The opportunities that analog systems offer are worth an investigation. The brain is analog afterall and with Benford law we would need 40 years to 1mil times our compute to reach similar...
Read 4 tweets
Mar 2
Advanced algorithmic trading textbooks not in roadmap:

Quantitative Trading: Algorithms, Analytics, Data, Models, Optimization amazon.co.uk/dp/1498706487/…

1/6
That was in roadmap but I didn’t highlight it.

Algorithmic Trading Methods: Applications Using Advanced Statistics, Optimization, and Machine Learning Techniques amazon.co.uk/dp/0128156309/…

2/6
Trader Construction Kit: Fundamental & Technical Analysis, Risk Management, Directional Trading, Spreads, Options, Quantitative Strategies, Execution, Position Management, Data Science & Programming amazon.co.uk/dp/0997629517/…

3/6
Read 7 tweets
Feb 28
Elaborating on topological structure. A venn diagram is a good basic example. Any point in section B of the diagram == any other point in section B, but is completely different from something in section A. This is a topological structure. Decision trees effectively do this.
A lot of the time this non-linearity lets you pick up on subtleties but is incredibly prone to overfitting and actually overfits when there is a linear relationship because linear relationships (straight lines) aren't that topological. some point is in some standard form...
different from another point. Whereas they can be the same for decision trees. This is where decision trees benefit from using logistic regressions and linear regressions and not classifying at all. Just taking the linear regression. You get a more non-linear line, but
Read 8 tweets
Feb 28
Not sure how I feel about de prado. For some things they are just stupid, like what the fuck is triple barrier and other times they are genius. Like when he said t-values of microstructure features. He did a meh job on the explanation, and I prefer Hasbrouck's book, but GOAT idea
Anyways, have a skim through advanced is financial machine learning and tell for yourself. There are a few moments of genius and other times utter horseshit. Fractional differencing is genius, but it isn't explained properly and the stationarity of features etc is key.
some features do not need stationarity, a lot of them don't. Most wavelets need stationarity, but modern methods have non-stationary wavelets now! Here is a lecture by him I mildly agreed with:

Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(