StatArb Profile picture
Mar 15 12 tweets 2 min read
Early poll results indicate feature examples are wanted. Let's do sentiment.

>> A word of caution: alternative data is not a holy grail, there is loads of it and it is hard to gain inference from. It also is not a massive part of the variance.

1/n
Plot sentiment, and it tends to look like noise. There is so much data for sentiment that it doesn't really matter what model you are using versus how you use it. More data usually means you need to think out of the box, because the marginal increase of model performance...

2/n
gets smaller and smaller, and there is a LOT of text data. If you are a beginner just looking for alpha, just get out something like textblob or spaCy, don't waste time with LSTMs and CNN and other NN models. First step, understand your flows and how you should...

3/n
weight the data. Your model will be able to grasp some of the relations between sentiment and its impact, but how will it know the source. Get the data for viewership when weighting your sentiment. Is it good/bad, how sure do they sound, what is their viewership? ...

4/n
The next part is just using headlines, full articles give very little extra inference, and you should manage time efficiently to get other features. Analyst ratings help too. Model how different classes/ caps are affected by different sentiment sources. There...

5/n
are even models with plenty of alpha for using CEOs' facial expressions on CNN, but don't use a CNN model XD, there are far better models for computer vision nowadays. Google searches as well are great. The key here is to remember that sentiment doesn't directly drive the..

6/n
market and there are heavy reflexive effects from other algorithms using them. So use statistical tests like Portmanteau Statistic, Hurst, KPSS, T-Values, etc to build meta-features. If you are doing HFT, you can use microstructural flow metrics like Kyle's Lambda...

7/n
and all the different variations of VPIN that exist can be ensembled like GARCH models would be. Just don't spend too much time playing around with sentiment, although it can be good if you prepare it well. If you are doing an exclusively alt data approach or being...

8/n
mid freq (but still intraday) or daily, just not HFT, then you will need to isolate the idiosyncratic effects of sentiment and alternative data variance since otherwise the noise will be overpowering. This is why you should make sure your portfolio is constantly hedged...

9/n
This will hurt your alpha in the short term, but on the mid-daily timeframes (sentiment means fuck all long term) you will need to stay delta neutral by shorting your top xx (maybe exponentially weighted) sells and longing your top xx buys...

10/n
When you do this your effects become more idiosyncratic so focus on sharpe instead of returns as returns will likely be smaller because it is a smaller part of the variance. You can ensemble this to improve capital efficiency or just lever it up and use...

11/n
a modified kelly that assumes way fatter tails than you observe in the implied PD. Careful with leverage of course as it eats into profits and is risky (I personally don't touch leverage one bit).

DONE

12/12

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with StatArb

StatArb Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TerribleQuant

Mar 17
Feature examples were highly requested. I will do in depth reviews for many but I can’t possibly do all. So I’ll give resources for top feature examples -

Stefan Jansen- ML 4 algorithmic trading - GitHub- Chapter 23; 200 or so features based on data mining and TA

Yam Peleg (YAMQWE) submission notebooks for the G-Research Crypto Forecasting competition on Kaggle. A few hundred features here from rolling volatility etc.

Another good one is Wikipedia, just go to statistical signal processing and search for all the methods…
They can usually all be applied with good success. Another good one is time-frequency/ wavelet analysis for which textbooks can be found. Financial signal processing is awesome.

Read 8 tweets
Mar 15
For the next hour this will just be a finmeme page. ImageImageImageImage
ImageImageImageImage
ImageImageImageImage
Read 5 tweets
Mar 15
WSB next:
Nothing but memes today
Read 7 tweets
Mar 13
For those unaware of getting flipped:

This is when you are on the other side of the OB with an aggressive (usually) LO acting as a maker, but the price moves and now you are on the opposite side of the OB, and get matched with an order as such. You pay taker fees when this...
happens which for a lot of HFT strats means certain death. No kidding. Rebate of 1bps vs 4bps fees. Not a fun time when your alpha is likely only a few bps and your BAS is a bp maybe. This example is crypto futures. The risk of getting flipped is why you have to be...
fast as a MM, not just to prevent leaving stale quotes on the OB, but also to prevent getting flipped. The risk increases as you become more aggressive, which is why a lot of stat arb strategies (like I use) that don't even need that fast execution, have to use it because...
Read 6 tweets
Mar 12
Whilst I am not a big fan of using linear regressions I do use regressions in the models I develop. Polynomial logistic regressions are effectively a smoothed decision tree surface for example. Regularization that limits depth of an NN or tree is great, but introducing...

1/n
the bias that jagged jumps in the decision surface are a bad idea through the use of regression-based models massively denoises your data. (not strictly regressions, for ex: regression NNs and regression decision trees are awesome). As I have mentioned before...

2/n
a decision tree cannot replicate a linear regression with regularization. It's like if you tried to fit a sin wave with a Taylor series, you can get close, but the level of complexity would be infinity for a perfect replication. (for taylor series this would be polynomial...

3/n
Read 4 tweets
Mar 12
Back to the quant topics:

Microstructural fair value!

Let's dive into it:

1/n
The usual approach is to take the midprice which is just the average between the best bid and ask. This is decent for most applications, but can definitely be improved. Decent won't cut it in HFT! ...
2/n
Let's outline a few approaches:

OB Liquidity Based:

Weighted Midprice
Exponentially Weighted Midprice
TA (MA) based variants

Microprice & Variants:

HMM Microprice (stoikov)
ARIMA Microprice
SAE Microprice
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(