I previously mentioned the use of CDS pricing literature for alternative copula distributions, and for bringing assumptions to the higher dimensions without using vines.
Continuing, this topic I'll go through some other methods/tricks:
For optimizing copulas the easiest method is to use a heuristic method and then brute force the lower dimensions. This is usually bivariate only, but with smaller asset universes trivariate is possible. From there you just test all permutations that include...
2/n
your established pair or triplet. To solve for the weights you can do this with standard methods. Another note is that since copulas are effectively conditional probabilities and vine copulas form chains, the Baum Welch algorithm can be adapted with use of non-gaussian...
3/n
distributions that handle more than bivariate dimensionalities, since Markov Chains are still conditional probabilities. There is plenty of literature for this and it isn't very hard to do so. Gaussian Mixture models can be adapted to provide even more flexibility...
4/n but there is not the same level of bias you gain from other copula approaches. Since your vines can be canonical to a single asset, you can cluster by industry or with unsupervised methods, and then use LASSO to induce sparsity between the connections...
5/n
using a little bit of graph theory can help, but sparse PCA does a fine job for most approaches. With your reduced asset series, you can now fit your paths to whichever asset is canonical.
6/n
The issue is that there is usually not a single canonical node so when optimizing do so for each node in your sample with the goal of inducing sparsity relative to it as the central node. Do not worry if your connections are multivariate, although this will...
7/n
drastically increase the computational complexity, hence why I originally brought up the use of heuristic methods. Since we are trying to induce sparsity using SPCA we can effectively replicate this with semi-definite programming using the matrix norm penalty.
8/n
The matrix norm penalty is effective equivalent to SPCA on the resultant matrix and frankly, either/both can be used. Thus is makes sense to either apply SPCA to the greedy solution (non-convex) or matrix norm penalty to the SDP (convex) solution.
9/9
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Whilst I am not a big fan of using linear regressions I do use regressions in the models I develop. Polynomial logistic regressions are effectively a smoothed decision tree surface for example. Regularization that limits depth of an NN or tree is great, but introducing...
1/n
the bias that jagged jumps in the decision surface are a bad idea through the use of regression-based models massively denoises your data. (not strictly regressions, for ex: regression NNs and regression decision trees are awesome). As I have mentioned before...
2/n
a decision tree cannot replicate a linear regression with regularization. It's like if you tried to fit a sin wave with a Taylor series, you can get close, but the level of complexity would be infinity for a perfect replication. (for taylor series this would be polynomial...
3/n
The usual approach is to take the midprice which is just the average between the best bid and ask. This is decent for most applications, but can definitely be improved. Decent won't cut it in HFT! ...
2/n
Let's outline a few approaches:
OB Liquidity Based:
Weighted Midprice
Exponentially Weighted Midprice
TA (MA) based variants
I’ve totally made up this word but there’s not word for the concept it captures and I’ve been using it for over a year regardless.
If you use bollinger bands to trade mean reverting portfolios your lag error is the loss of alpha from the deterministic component.
This comes in 3 forms:
Jump risk:
Large jumps in the mean will take time for your mean to move to and cause errors because moving averages are lagged. This is a regime shifting ish problem and is aided by unsupervised learning models with conservatism controls.
…
The next is mismatched period:
If there is a sin wave with white noise we may attempt to use an MA to trade the noise part. This will give us lag error as we will not be accounting for the broader sim function and get lag error, hurting our PnL. Mismatched timeframe
For anyone making an HFT strat, you need a simulator to see when you get filled. You can use Kalman queues, multi-queues, sim matching engines and they are all cool but usually don't properly capture effects like adversity. Then there are stochastic approaches...
1/n
This would be like simulating the poisson process. This sucks even worse as it utterly and entirely ignores adversity. At least sim matchers give a try, although not a great one and an easy NN will trounce it. Plus stochastic approaches don't use historical data...
2/n
Other methods involve procedural OB sims which basically just take the data and add a bit of stochastic overlay to the historical OB, this assumes (most of the time) that you get filled when midprice crosses your bid/ask which is just wrong because it neglects...
3/n
Async I/O
Multi-threaded CPU tasks
Hyper-threaded CPU tasks
Async I/O will only benefit speed-wise from the writing data component of the work and can slow you down if that is not significant. There is only one NAC remember, but there is not infinite cache and management is expensive.
1/3
Multi-threading is great, but not for file handling, that won't be faster and will split your cache once again so for ultra low latency applications you usually just have one super fast core enabled so you can maximise cache.
2/3
What is really interesting is the high-level mathematics of neural networks. Ignoring activation functions, I mean shape. Two groundbreaking discoveries:
1) Inference is (non linearly) proportional to the NN depth and so is compute time. 2) Infinitely wide NN is equal to an SVM
1000 transistors are needed to multiply a number with digital systems. This just needs a resistor for analog signals. We already know NNs can be inaccurate and still work fine hence special float formats for them. Analog systems are the way forward as transistors hit the size...
of an atom. Nowadays we just pack more on a chip but densities haven't changed much. The opportunities that analog systems offer are worth an investigation. The brain is analog afterall and with Benford law we would need 40 years to 1mil times our compute to reach similar...