Martin Görner Profile picture
Product Manager for Keras and Tensorflow high-level APIs. Previously worked on Cloud TPUs (Tensor Processing Units). Passionate about democratizing ML.
Nikolai Profile picture 1 subscribed
Jan 16 19 tweets 5 min read
The "Self-Extend" paper promises magic for your LLMs: extending the context window beyond what they were trained on. You can take an LLM trained on 2000 token sequences, feed it 5000 tokens and expect it to work. Thread 🧵
(SWA below=sliding window attn.) arxiv.org/abs/2401.01325
Image To be fair, some LLMs can already do that, if they are trained with a specific positional encoding like Alibi (). And before LLMs, Recurrent Neural Networks (RNNs) could do this trick as well. But was lost in Transformers.arxiv.org/abs/2108.12409
Dec 29, 2022 11 tweets 3 min read
Large Language Models are getting good at formal logic:
arxiv.org/abs/2212.13894 LAMBADA: Backward Chaining for Automated Reasoning. This paper is, in part, a traditional algorithm, a "depth-first search algorithm over the facts and the rules", starting from the desired conclusion and trying to logically reach the premises (facts and rules).
Dec 8, 2022 14 tweets 4 min read
How can you probe what a language model knows ? If you ask it directly, it might lie (for example because you prefixed your question with untruths, or many other reasons).
Contrast-Consistent Search (CCS) gives a way:
openreview.net/pdf?id=ETKGuby… It takes advantage of a nice property of True/False statements: they cannot be True and False at the same time.
Take any statement "The Eiffel tower is a crab", add True/False to the end and you have two mutually exclusive statents.
Dec 7, 2022 4 tweets 3 min read
Here is @luke_wood_ml
explaining Stable Diffusion at #Devoxx

Stable Diffusion is the first end-to-end model in the new KerasCV library. @luke_wood_ml The code to generate an image:
Here is also, the presentation:
lukewood.github.io/devoxx/
and a Colab notebook to try it out: colab.research.google.com/github/lukewoo…
#KerasCV Image
Dec 5, 2022 14 tweets 8 min read
Thought-provocative new paper from @geoffreyhinton: what if we could replace backpropagation with something better? @geoffreyhinton I seems very unlikely that the human brain uses back propagation to learn. There is little evidence of backprop mechanics in biological brains (no error derivatives propagating backwards, no storage of neuron activities to use in a packprop pass, ...).
Nov 9, 2022 6 tweets 2 min read
Contrastive Search is the new kid on the block for text generation from language models. Better than greedy or beam search, top-k, nucleus sampling etc

Can continue text from a prefix with quality indistinguishable from a human, as judged by humans
paper: arxiv.org/abs/2210.14140 Image In the experiment results above, the model continues a given text and human raters evaluate the result.
The raters preferred text generated by contrastive search 60-70% of the time (green box). When comparing to human output, they were undecided (red ellipses).
Feb 15, 2022 8 tweets 2 min read
Google's LaMDA paper arxiv.org/abs/2201.08239 shows yet another information retrieval strategy: it has been taught to ask a search engine, or a calculator . The first answer "It [Eiffel Tower] was constructed in 1887" is generated directly, but also recognized as containing a factual statement. This sends the whole context to LaMDA-Research which is trained to generate search queries, here "TS, Eiffel Tower, construction date"
Feb 4, 2022 6 tweets 4 min read
This is sweet 🥧 !
arxiv.org/abs/2202.01197
Finally a solid way of of teaching a neural network to know what it does not know.
(OOD = Out Of Domain, i.e. not one of the classes in the training data.) Congrats @SharonYixuanLin @xuefeng_du @MuCai7 The nice part is that it's a purely architectural change of the detection network, with a new contrastive loss which does not introduce additional hyper-parameters. No additional data required !
Feb 4, 2022 8 tweets 2 min read
I like the "database layer" developed by DeepMind in their RETRO architecture:
deepmind.com/blog/article/l…
It teaches the model to retrieve text chunks from a vast textual database (by their nearest neighbour match of their BERT-generated embeddings) and use them when generating text It's a bit different from the "memory layer" I tweeted about previously, which provides a large learnable memory, without increasing the number of learnable weights. (for ref: arxiv.org/pdf/1907.05242…)
Feb 2, 2022 7 tweets 2 min read
I'm humbled by the recent advances in NLP. I was testing this Keras model on @huggingface (huggingface.co/keras-io/trans…) using the abstract of a random (but good) ML article:
arxiv.org/pdf/2002.09405… Q: "Which examples of simulated environments are given in the text ?"
A: "fluids, rigid solids, and deformable materials"
👍 spot on
Aug 2, 2021 5 tweets 2 min read
Here is Mask R-CNN, the most popular architecture used for object detection and segmentation. The conceptual principle of the R-CNN family is to use a two-step process for object detection:
1) a Region Proposal Network (RPN) identifies regions of interests(ROIs)
2) The ROIs are cut from the image and fed through a classifier.
Jul 19, 2021 4 tweets 2 min read
The MobileNet family of convolutional architectures uses depth-wise convolutions where the channels of the input are convolved independently. Their basic building block is called the "Inverted Residual Bottleneck", compared here with the basic blocks in ResNet and Xception (dw-conv for depth-wise convolution).
Jun 28, 2021 4 tweets 2 min read
I made a ton of ML architecture illustrations for an upcoming book. Starting with good old Alex Net

The book: oreilly.com/library/view/p… by @lak_gcp, Ryan Gillard and myself. and just as good and old VGG19:
Nov 28, 2019 32 tweets 15 min read
Now reading the ARC paper by @fchollet.
arxiv.org/abs/1911.01547 “On the measure of intelligence” where he proposes a new benchmark for “intelligence” called the “Abstraction and Reasoning corpus”.
Highlights below -> @fchollet Chess was considered the pinnacle of human intelligence, … until it was solved by a computer and surpassed Garry Kasparov in 1997. Today, it is hard to argue that a min-max algorithm with optimizations represents “intelligence”.
Sep 13, 2018 7 tweets 3 min read
Google Cloud Platform now has preconfigured deep learning images with Tensorflow, PyTorch, Jupyter, Cuda and CuDNN already installed. It took me some time to figure out how to start Jupyter on such an instance. Turns out it's a one liner: Detailed instructions:
1) Go to cloud.google.com/console and create an instance (pick the Tensorflow deep learning image and a powerful GPU)
Jan 19, 2017 8 tweets 5 min read
I believe a dev can get up to speed on neural networks in 3h and then learn by himself. Ready for a crash course? /1 Got 3 more hours ? The "Tensorflow without a PhD" series continues. First a deep dive into modern convolutional architectures: .