I like the "database layer" developed by DeepMind in their RETRO architecture: deepmind.com/blog/article/l…
It teaches the model to retrieve text chunks from a vast textual database (by their nearest neighbour match of their BERT-generated embeddings) and use them when generating text
It's a bit different from the "memory layer" I tweeted about previously, which provides a large learnable memory, without increasing the number of learnable weights. (for ref: arxiv.org/pdf/1907.05242…)
This time, the model learns the trick of retrieving relevant pieces of knowledge from a large corpus of text.
The end result is similar: an NLP model that can do what the big guns can (Gopher, Jurassic-1, GPT3) with a tenth of their learnable weights.
It gives you a couple of nice perks: 1) the text database can be tweaked post-training if you find dodgy stuff in it (bad language, factual inaccuracies, ...)
2) explainability: you can see which snippets of text are pulled from the database and used a "reference" to generate a given answer !
3) And you can fairly easily add this "database layer" and its associated corpus of text post-training, with a fairly cheap fine-tuning step.
They call it RETRO-fitting an existing pre-trained model 😁.
But the real kicker is: can you add data to the text corpus, for example Tennis US Open results it had never seen before, and ask the model "who the 2021 women's US Open?"
Since the model can be "RETRO-fitted" with new data fairly easily, you can, and the answer it gives is:
"Emma Raducanu - she defeated Leylah Fernandez 6-4 6-3 ..."
Quite impressive, although the authors note that other models (FiD) can beat it at question answering.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
This is sweet 🥧 ! arxiv.org/abs/2202.01197
Finally a solid way of of teaching a neural network to know what it does not know.
(OOD = Out Of Domain, i.e. not one of the classes in the training data.) Congrats @SharonYixuanLin @xuefeng_du@MuCai7
The nice part is that it's a purely architectural change of the detection network, with a new contrastive loss which does not introduce additional hyper-parameters. No additional data required !
The results are competitive with training on a larger dataset manually extended with outliers: "Our method achieves OOD detection performance on COCO (AUROC: 88.66%) that favorably matches outlier exposure (AUROC: 90.18%), and does not require external data."
Here is Mask R-CNN, the most popular architecture used for object detection and segmentation.
The conceptual principle of the R-CNN family is to use a two-step process for object detection: 1) a Region Proposal Network (RPN) identifies regions of interests(ROIs) 2) The ROIs are cut from the image and fed through a classifier.
In fact, the cutting is not done the original image but directly on the feature maps extracted from the backbone. Since the feature maps are much lower resolution than the image, the cropping requires some care: sub-pixel extraction and interpolation aka. "ROI alignment".
The MobileNet family of convolutional architectures uses depth-wise convolutions where the channels of the input are convolved independently.
Their basic building block is called the "Inverted Residual Bottleneck", compared here with the basic blocks in ResNet and Xception (dw-conv for depth-wise convolution).
Here is MobileNetV2, optimized for low weight count and fast inference.
Now reading the ARC paper by @fchollet. arxiv.org/abs/1911.01547 “On the measure of intelligence” where he proposes a new benchmark for “intelligence” called the “Abstraction and Reasoning corpus”.
Highlights below ->
@fchollet Chess was considered the pinnacle of human intelligence, … until it was solved by a computer and surpassed Garry Kasparov in 1997. Today, it is hard to argue that a min-max algorithm with optimizations represents “intelligence”.
@fchollet AlphaGo took this to the next step. It became world champion at Go by using deep learning. Still, the program is narrowly focused on playing Go and solving this task did not lead to breakthroughs in other fields.