This is where you assume you can observe a price and then trade at that price. This is obviously false (apart from anything else it takes some time for you to observe a price and act on it, and the price might move in that time) but ...
... often people convince themselves that it's a very benign form of look-ahead, and probably won't materially affect the backtest results.
That *might* be true, it depends on the strategy. But most often it's not.
If you are trading very frequently, the chance that the price moves while your order is being sent to the market is higher (future price moves will most likely be correlated with the direction of your orders so the price moves away from you more often than not)
If you are trading only infrequently then it likely takes some time to work your order in the market. The price you get will be closer to the VWAP over the period you execute the order than to the price you observed before you sent the order.
The very worst form of this bias is when you observe a closing price and assume you can trade at that price. It's so bad because (a) the closing price is often quite distorted - a lot of flow happens just before the close and it can move the close away from fair value, and ...
(b) by definition you can't trade after the close, because the market is closed! The next opportunity you get to trade will be the open of the next session, when the price could be *very* different.
This particularly affects mean reversion strategies, which look great if you assume you can observe the close and then trade there - I won't explain why but you should be able to figure it out.
To mitigate, you need to make sure your signal is formed *before* the price you intend to trade at is observed. That might mean forming the signal 30 minutes before the close if you intend to trade in the closing auction.
Or if using binned data, you can form your signal at the end of each bin and then assume execution at the VWAP (or closing price) for the next bin. For high frequency strategies it means correctly modelling the latency of your orders being sent to market.
Thursday morning quant interview question. A junior comes to you with a ML model trained using walk-forward validation, and shows the following backtest, created by stitching the out of sample periods. What are your comments? What might they have done wrong, if anything?
I think 4-5 people got this exactly right, and a few more had answers along the right lines but didn't mention some key detail. This is the first definitely correct answer that I saw -
Correlation between your signal and future returns is an important metric in quant trading. But what is a “good” correlation? Here’s a simple way to think about it.
We’ll use a simple model where future returns y over some time period tau are normally distributed with a mean of beta * x and a daily volatility of sigma (here x is a signal with std deviation 1)
We can easily work out the correlation between signal and returns and use that to express beta as a function of correlation, volatility and forecast horizon.
In quant firms, proprietary signal research can uncover new, idiosyncratic alphas (which causes firms to decorrelate). But over time these ideas diffuse (researchers and PMs move between firms and take ideas with them) which causes them to correlate and crowd into the same names.
Use of the same “alternative” datasets also causes quant firms to converge, even more so now that many firms use data brokers to source new datasets (and the brokers will give little nudges like … “we’re seeing a lot of interest in this dataset, maybe you should take a look”)
Does the profitability of vol selling strategies depend on starting volatility level?
A short story.
We start with front month VIX futures beginning in 2005, shortly after the contract was launched, so ~20 years of data.
For each day, calculate the P&L from shorting one futures contract. By working in price space we ignore any issues from from calculating VIX returns.
Every 21 days, sample the starting VIX level, and calculate the P&L from being short one near-term contract, assuming we roll over to the next contract at expiry.
This means that we have a dataset of non-overlapping sample P&Ls with ~1 month holding period.
Many mistakes here, including confusing gross and net returns, and not understanding the the fund mostly paid out profits as a dividend, so you couldn't compound.
So if you invested $10,000 into Medallion at the start of 1988, how would you *really* have done after 30 years?
It's pretty easy to figure out, since the net returns are listed along with the fund size at the end of year year, so we can approximately know how much capital was allowed to remain within the fund and how much was returned.
Assume that if the fund size grew by more than the net return, then all capital remains within the fund. Otherwise assume that the difference was returned as a dividend and invested into treasuries.