Feature examples were highly requested. I will do in depth reviews for many but I can’t possibly do all. So I’ll give resources for top feature examples -
Stefan Jansen- ML 4 algorithmic trading - GitHub- Chapter 23; 200 or so features based on data mining and TA
…
Yam Peleg (YAMQWE) submission notebooks for the G-Research Crypto Forecasting competition on Kaggle. A few hundred features here from rolling volatility etc.
Another good one is Wikipedia, just go to statistical signal processing and search for all the methods…
They can usually all be applied with good success. Another good one is time-frequency/ wavelet analysis for which textbooks can be found. Financial signal processing is awesome.
Early poll results indicate feature examples are wanted. Let's do sentiment.
>> A word of caution: alternative data is not a holy grail, there is loads of it and it is hard to gain inference from. It also is not a massive part of the variance.
1/n
Plot sentiment, and it tends to look like noise. There is so much data for sentiment that it doesn't really matter what model you are using versus how you use it. More data usually means you need to think out of the box, because the marginal increase of model performance...
2/n
gets smaller and smaller, and there is a LOT of text data. If you are a beginner just looking for alpha, just get out something like textblob or spaCy, don't waste time with LSTMs and CNN and other NN models. First step, understand your flows and how you should...
3/n
This is when you are on the other side of the OB with an aggressive (usually) LO acting as a maker, but the price moves and now you are on the opposite side of the OB, and get matched with an order as such. You pay taker fees when this...
happens which for a lot of HFT strats means certain death. No kidding. Rebate of 1bps vs 4bps fees. Not a fun time when your alpha is likely only a few bps and your BAS is a bp maybe. This example is crypto futures. The risk of getting flipped is why you have to be...
fast as a MM, not just to prevent leaving stale quotes on the OB, but also to prevent getting flipped. The risk increases as you become more aggressive, which is why a lot of stat arb strategies (like I use) that don't even need that fast execution, have to use it because...
Whilst I am not a big fan of using linear regressions I do use regressions in the models I develop. Polynomial logistic regressions are effectively a smoothed decision tree surface for example. Regularization that limits depth of an NN or tree is great, but introducing...
1/n
the bias that jagged jumps in the decision surface are a bad idea through the use of regression-based models massively denoises your data. (not strictly regressions, for ex: regression NNs and regression decision trees are awesome). As I have mentioned before...
2/n
a decision tree cannot replicate a linear regression with regularization. It's like if you tried to fit a sin wave with a Taylor series, you can get close, but the level of complexity would be infinity for a perfect replication. (for taylor series this would be polynomial...
3/n
The usual approach is to take the midprice which is just the average between the best bid and ask. This is decent for most applications, but can definitely be improved. Decent won't cut it in HFT! ...
2/n
Let's outline a few approaches:
OB Liquidity Based:
Weighted Midprice
Exponentially Weighted Midprice
TA (MA) based variants