The "common usage" definition as of 2019 would be "chains of differentiable parametric layers trained end-to-end with backprop".
But this definition seems overly restrictive to me. It describes *how we do DL today*, not *what it is*.
Is an HMAX model (with learned features) not deep learning?
Is a deep neural network trained greedily layer-by-layer not deep learning?
I say they're all deep learning.
1) Things that are not representation learning (e.g. manual feature engineering like SIFT, symbolic AI, etc.)
2) "Shallow learning", where there is a single feature extraction layer.
It's the *what* (nature and structure), not the *how*.
The 2019 flavors of DNNs are DL, of course. So are DNNs trained with backprop alternatives like ES, ADMM, or virtual gradients.
Genetic programming is not DL. Quicksort is not DL. Nor is SVM.
K-means is not DL. But stacking k-means feature extractors is DL.
When in 2011-12 I was doing stacked matrix factorization over matrices of pairwise mutual information of locations in video data, that was deep learning.
You could do symbol manipulation with DL, but it involves lots of extra steps.
Deep learning isn't just end-to-end gradient descent, but not every program is deep learning either. In fact, deep learning models only represents a tiny, tiny slice of program space.
It can't hurt to look beyond it.