Large quantity of unique features
Really good dimensionality reduction
Ensemble everywhere!
A word on each...
When it comes to modeling everyone always goes to their favorite NNs like LSTMs etc or LGBMs and those are great, but everyone has them, and frankly, they aren't that hard to implement! Just look at Kaggle if you want an example of DS students using them everywhere...
For real alpha, you need to focus on the three most ignored areas (there is a fourth, speed, but that's not really modeling, and a fifth which I'm not telling you because I like my alpha unleaked). That sounded super guru-like, but I promise these work and I use them.
Starting with the first, it is usually quantity over quality. When you build a massive alpha signal library, sure go ahead and focus on specific signals. Otherwise, use TA-Lib, Github, Stefan-Jansen repo, Kaggle, Wikipedia, any mass dump of features (preferably with code...
so that you can save time). It's about making lots of them not about having some cool special strategy. Those do work as well, but you need to build on the mass features afterward, not before. You need to get what most others have (usually only in part, but still most...
others will have quite a few of these features) before you can decide to become special. The next key point is dimensionality reduction. It's simple math. Even if you have loads of HFT data it scales so non-linearly that won't save you. If I have 100 samples in 1D because...
2D is ^2 in terms of volume, I now need 100^2 (10,000) samples to get the same amount of data effectively. Hopefully, that's intuitive, but it has to do with per unit of space how much data is in there. High density means lots of data of course, but a large volume as anyone...
knows means less density and effectively less data. If you have 100D you now need 1e+200 samples to be equal to 100 samples in 1D... yeah. So thats why we reduce dimensionality. Not because it's fun (although it is). Now don't just use PCA, it's a fucking linear regression at...
the end of the day! Start with PCA as I discussed in my last thread. PCA is not a dimensionality reduction, but it does separate mutual information from the residuals. When used as dim reduction you just chuck the residuals and assume they are just noise...
but for us, we will just say that they are not noise, but instead special features that have non-linearity. We can then use manifold learning on it, or just feed it into a supervised autoencoder (supervised with either an LSTM or MLP) and the SAE will find those non-linearities..
A lot of people trash AEs because they effectively just pick up on PCA, but if you think about it as learning, like anyone you start with the basics so obviously they pick up the linearly mutual relationships first. They just usually don't get super far into the meaty non...
linear parts. So that's why we do PCA first. We have done that work for it. It can now use its precious inference on the non-linear parts. Finally, ensemble, this can be done over multiple timeframes with hierarchical modeling (talked about this a lot in other threads...
) or it can be from stacking algorithms, like using an ARIMA as a feature (this is risky and you only do this with simple, linear models like AR because they wont overfit, don't put an NN in an NN please for the love of god, otherwise you would just reduce regularization...
in the original NN. That's why metalabelling can be stupid sometimes). The last form is just by using similar models. Like why not use LSTM and WaveNet and just ensemble them. It reduces overfitting as it's like having two judges and pretty much always the whole is greater than..
the average of all predictabilities, or even the max predictor is usually worse than the whole. I'll say it again for those who didn't hear it last time. A GARCH ensemble thrashes every single GARCH family model, and that's something considering so many exist/ can get lucky
End of thread. Enjoy your day, and remember that glitzy models like NNs you use at the end aren't really the alpha. I use like 800-1000 features (maybe like 5k since there are diff periods ones for diff timescales) btw in case you are wondering.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
For those wondering if they are, I'll give a few comments:
T-SNE is always something I want to apply, but can never quite figure out the right way to do it. There is certainly some benefits to be had from having a basic understanding of what this all means so you can get a better chance at visualizing your features in 2D...
Much like stochastic methods, as much as I would never make them the center of a model, there is always use as a feature or ensemble. Ensemble is truly the free lunch of alpha...
A key concept for MMs is how you manage inventory. Avellaneda and Stoikov is basically the model everyone uses for this. Then there comes the offset, basically how wide your spreads are. That's your basic model of liquidity provision...
From there we get to have some fun! If you can create multiple forecasts for different timeframes (and at a super-advanced level compute speeds) you can make spreads asymmetric and intentionally hold inventory...
Entirely unprompted here, but please check out @FadingRallies. Also @choffstein's Liquidity Cascades paper (link below). The flow between MMs, passive funds, ELS, and generally the effects of reflexive dealer hedging are key to understanding this regime!
Even if you aren't a trader (I certainly am not, although I try to keep up with it all) it is still super important to understand the regime and how it all fits in from a risk perspective. You CANNOT just take the models as your risk! Eigenportfolios decay, I would know I work
with them all the time so that isn't even the perfect metric (although I do love them). Statistical models will capture some risk but at the end of the day, you choose the parameters and the distribution you feed in is key. Knowing fat tails exist is incredibly important for this
Tweeting a question I was asked/ response regarding MM:
(me adding bonus resources):
A great example of C++ HFT MM algorithms. An improvement idea I have suggested to the author but can also be attempted by interested algotraders is that a fast model like XGBOOST (there is a C++
library) is used alongside some alphas to make spreads asymmetric before traders can trade against you and you get negative edge in those trades. A large part of market making is cheaply executing alphas by trying to get inventory on the side of your predictions and also by
getting out the way of adverse conditions by making your spreads asymmetrically wide (traders with alpha against you). github.com/hello2all/gamm…
I think you can probably classify modeling and feature engineering into a few areas: ML, Statistics, Time Series, Microstructural, some fun extras like entropy which are super weird to work with TA, data-mined alphas, and signal processing.
1/who knows lol
I'll probably speak on each of these eventually, but today I think it'd be good to get some publicity on signal processing. It's underhyped compared to ML and just as deserving.
2/
A lot of the literature is exclusive to electrical engineering and CS, but I can tell you there is lots of alpha in the area. As the story usually goes NN models like LSTMs get a bad rap performance-wise bc of their terrible application
3/
I'm blatantly copying and pasting this from Mephisto but the fact that some people haven't seen this thread, and will otherwise never be able to since the account is gone is sad:
OK, picking apart $SKEW. Why is it 'hot garbage' -h/t @ jsoloff. This is for anyone who read @ SoberLook (who of course are awesome but the problem is elsewhere)
"Have you met my bro $SKEW? He's useless but his dad owns a boat."