StatArb Profile picture
Feb 19 16 tweets 3 min read
Three pillars of my ML modeling philosphy:

Large quantity of unique features
Really good dimensionality reduction
Ensemble everywhere!

A word on each...
When it comes to modeling everyone always goes to their favorite NNs like LSTMs etc or LGBMs and those are great, but everyone has them, and frankly, they aren't that hard to implement! Just look at Kaggle if you want an example of DS students using them everywhere...
For real alpha, you need to focus on the three most ignored areas (there is a fourth, speed, but that's not really modeling, and a fifth which I'm not telling you because I like my alpha unleaked). That sounded super guru-like, but I promise these work and I use them.
Starting with the first, it is usually quantity over quality. When you build a massive alpha signal library, sure go ahead and focus on specific signals. Otherwise, use TA-Lib, Github, Stefan-Jansen repo, Kaggle, Wikipedia, any mass dump of features (preferably with code...
so that you can save time). It's about making lots of them not about having some cool special strategy. Those do work as well, but you need to build on the mass features afterward, not before. You need to get what most others have (usually only in part, but still most...
others will have quite a few of these features) before you can decide to become special. The next key point is dimensionality reduction. It's simple math. Even if you have loads of HFT data it scales so non-linearly that won't save you. If I have 100 samples in 1D because...
2D is ^2 in terms of volume, I now need 100^2 (10,000) samples to get the same amount of data effectively. Hopefully, that's intuitive, but it has to do with per unit of space how much data is in there. High density means lots of data of course, but a large volume as anyone...
knows means less density and effectively less data. If you have 100D you now need 1e+200 samples to be equal to 100 samples in 1D... yeah. So thats why we reduce dimensionality. Not because it's fun (although it is). Now don't just use PCA, it's a fucking linear regression at...
the end of the day! Start with PCA as I discussed in my last thread. PCA is not a dimensionality reduction, but it does separate mutual information from the residuals. When used as dim reduction you just chuck the residuals and assume they are just noise...
but for us, we will just say that they are not noise, but instead special features that have non-linearity. We can then use manifold learning on it, or just feed it into a supervised autoencoder (supervised with either an LSTM or MLP) and the SAE will find those non-linearities..
A lot of people trash AEs because they effectively just pick up on PCA, but if you think about it as learning, like anyone you start with the basics so obviously they pick up the linearly mutual relationships first. They just usually don't get super far into the meaty non...
linear parts. So that's why we do PCA first. We have done that work for it. It can now use its precious inference on the non-linear parts. Finally, ensemble, this can be done over multiple timeframes with hierarchical modeling (talked about this a lot in other threads...
) or it can be from stacking algorithms, like using an ARIMA as a feature (this is risky and you only do this with simple, linear models like AR because they wont overfit, don't put an NN in an NN please for the love of god, otherwise you would just reduce regularization...
in the original NN. That's why metalabelling can be stupid sometimes). The last form is just by using similar models. Like why not use LSTM and WaveNet and just ensemble them. It reduces overfitting as it's like having two judges and pretty much always the whole is greater than..
the average of all predictabilities, or even the max predictor is usually worse than the whole. I'll say it again for those who didn't hear it last time. A GARCH ensemble thrashes every single GARCH family model, and that's something considering so many exist/ can get lucky
End of thread. Enjoy your day, and remember that glitzy models like NNs you use at the end aren't really the alpha. I use like 800-1000 features (maybe like 5k since there are diff periods ones for diff timescales) btw in case you are wondering.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with StatArb

StatArb Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TerribleQuant

Feb 18
Found this bad boy in my bookmarks from back when I decided manifolds were the key to the market and bought like 5 textbooks on them.

quantivity.wordpress.com/2011/05/08/man…

For those wondering if they are, I'll give a few comments:
T-SNE is always something I want to apply, but can never quite figure out the right way to do it. There is certainly some benefits to be had from having a basic understanding of what this all means so you can get a better chance at visualizing your features in 2D...
Much like stochastic methods, as much as I would never make them the center of a model, there is always use as a feature or ensemble. Ensemble is truly the free lunch of alpha...
Read 13 tweets
Feb 3
I'm not an MM, but it's the only thing I know lots about that I can talk about without leaking alpha (that I use):

I'll discuss a few MM points

Great open source MM project:
github.com/hummingbot/hum…
The OG paper:
math.nyu.edu/~avellane/High…
An improved paper:
aeaweb.org/conference/201…
A key concept for MMs is how you manage inventory. Avellaneda and Stoikov is basically the model everyone uses for this. Then there comes the offset, basically how wide your spreads are. That's your basic model of liquidity provision...
From there we get to have some fun! If you can create multiple forecasts for different timeframes (and at a super-advanced level compute speeds) you can make spreads asymmetric and intentionally hold inventory...
Read 13 tweets
Feb 1
Entirely unprompted here, but please check out @FadingRallies. Also @choffstein's Liquidity Cascades paper (link below). The flow between MMs, passive funds, ELS, and generally the effects of reflexive dealer hedging are key to understanding this regime!

thinknewfound.com/liquidity-casc…
Even if you aren't a trader (I certainly am not, although I try to keep up with it all) it is still super important to understand the regime and how it all fits in from a risk perspective. You CANNOT just take the models as your risk! Eigenportfolios decay, I would know I work
with them all the time so that isn't even the perfect metric (although I do love them). Statistical models will capture some risk but at the end of the day, you choose the parameters and the distribution you feed in is key. Knowing fat tails exist is incredibly important for this
Read 7 tweets
Jan 29
Tweeting a question I was asked/ response regarding MM:
(me adding bonus resources):
A great example of C++ HFT MM algorithms. An improvement idea I have suggested to the author but can also be attempted by interested algotraders is that a fast model like XGBOOST (there is a C++
library) is used alongside some alphas to make spreads asymmetric before traders can trade against you and you get negative edge in those trades. A large part of market making is cheaply executing alphas by trying to get inventory on the side of your predictions and also by
getting out the way of adverse conditions by making your spreads asymmetrically wide (traders with alpha against you). github.com/hello2all/gamm…
Read 17 tweets
Jan 18
I think you can probably classify modeling and feature engineering into a few areas: ML, Statistics, Time Series, Microstructural, some fun extras like entropy which are super weird to work with TA, data-mined alphas, and signal processing.
1/who knows lol
I'll probably speak on each of these eventually, but today I think it'd be good to get some publicity on signal processing. It's underhyped compared to ML and just as deserving.
2/
A lot of the literature is exclusive to electrical engineering and CS, but I can tell you there is lots of alpha in the area. As the story usually goes NN models like LSTMs get a bad rap performance-wise bc of their terrible application
3/
Read 22 tweets
Jan 7
I'm blatantly copying and pasting this from Mephisto but the fact that some people haven't seen this thread, and will otherwise never be able to since the account is gone is sad:
OK, picking apart $SKEW. Why is it 'hot garbage' -h/t @ jsoloff. This is for anyone who read @ SoberLook (who of course are awesome but the problem is elsewhere)

"Have you met my bro $SKEW? He's useless but his dad owns a boat."

1/ too many
Image
Read 24 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

:(