The paper "Are Transformers Effective for Time Series Forecasting?" arxiv.org/abs/2205.13504, claims that a simple linear model DLinear outperforms Transformer based models by a large margin.
Why does that happen in their case? Firstly, they compare a linear model with massive models, and they even acknowledge it. Secondly, they compare their univariate model with multivariate versions of the Transformers, which on the small datasets gives poor performance.
Comparing with Transformer models with approx. the same number of parameters and in the univariate settings causes the Transformer or in fact, any neural model to be better than the linear model. Why is that the case?
Well, let's look at the prediction of the DLinear trained on the Traffic dataset which has a distributional shift on the weekends. Since the linear models have no capacity to incorporate covariates, the model does not know where it is in time:
A non-linear model can incorporate such covariates and are able to give better predictions in such cases. For example, the same forecast with Autoformer gives:
• • •
Missing some Tweet in this thread? You can try to
force a refresh