VGG, U-Net, TCN, ... CNNs are powerful but must be tailored to specific problems, data-types, -lenghts & -resolutions.
Can we design a single CNN that works well on all these settings?๐คYes! Meet the ๐๐๐๐, a single CNN that achieves SOTA on several datasets, e.g., LRA!๐ฅ
๐๐๐ข๐ง ๐๐๐๐: Architecture changes are needed to model long range dependencies for signals of different length, res & dims, e.g., pooling, depth, kernel sizes.
To solve all tasks with a single CNN, it must model long range deps. at every layer: use Continuous Conv Kernels!
๐๐จ๐ง๐ญ๐ข๐ง๐ฎ๐จ๐ฎ๐ฌ ๐๐จ๐ง๐ฏ. ๐๐๐ซ๐ง๐๐ฅ๐ฌ allow you to create conv. kernels of arbitrary length and dimensionlity that generalize across resolutions by parameterizing them with a neural net.
๐๐ ๐ฏ๐ฌ ๐๐๐๐. Inspired by the powerful S4 model (arxiv.org/abs/2111.00396), we use a variation of their residual blocks, which we call S4 Blocks.
However, in contrast to S4, which only works with 1D signals, CCNNs easily model long range dependencies in ND.
๐๐๐ฌ๐ฎ๐ฅ๐ญ๐ฌ ๐๐. CCNNs obtain SOTA on several sequential benchmarks, e.g., Long Range Arena, Speech Recognition, 1D img classification, all with a single architecture.
CCNNs are often smaller and simpler than other methods.
๐๐๐ฌ๐ฎ๐ฅ๐ญ๐ฌ ๐๐. With a single architecture, the CCNN matches & surpasses much deeper CNNs!
๐๐จ๐ง๐ ๐๐๐ง๐ ๐ ๐๐ซ๐๐ง๐ ๐ข๐ง ๐๐. Some LRA tasks are defined on 2D data. Using this info -not possible for other methods, eg S4- CCNNs easily get much getter results faster!
๐๐จ๐ง๐๐ฅ๐ฎ๐ฌ๐ข๐จ๐ง. The CCNN is a single CNN architecture that works well on several tasks on data with different lenghts, resolutions & dimensionality.
๐ ๐ฎ๐ญ๐ฎ๐ซ๐ ๐๐จ๐ซ๐ค. We plan to extend our results to 3D data & other tasks, e.g., segmentation, generative modelling.
๐๐จ๐ฅ๐ฅ๐๐๐จ๐ซ๐๐ญ๐ข๐จ๐ง๐ฌ. Would you like to collaborate with us on this project? Ping me!๐We want to have an extensive list of experiments in this project & we are sure we can use your expertise and use case!
๐๐๐ค๐ง๐จ๐ฐ๐ฅ๐๐๐ ๐๐ฆ๐๐ง๐ญ๐ฌ. This is part of my Qualcomm Innovetion Fellowship series. I'd like to thank @QCOMResearch & my mentor @danielewworrall for their support!
๐ ๐ฅ๐๐ฑ๐๐จ๐ง๐ฏ - ๐ฆ๐๐ข๐ง ๐ข๐๐๐: We model conv. kernels as continuous functions of compact support. This is done by using an MLP to model the kernel and a mask to determine its size.
By making the mask params. learnable, the network can decide how big each kernel should be.