Latest Twitter Threads by @PetarV_93 on Thread Reader App

Jan 2 • 15 tweets • 6 min read

Recently, @ytz2024 (co-lead of Differential Transformers) provided an insightful analysis of whether their work escapes the constraints of "softmax is not enough".

My thoughts follow below! tl;dr: differentials fix the issue in theory, but I'm not convinced they do in practice.

https://twitter.com/ytz2024/status/1874695567198265787

For starters, a quick primer:

* We prove attentional coefficients of _all_ global softmax heads _must_ disperse at increasing problem sizes, rendering them incapable of sharp reasoning.
* Diff Transformers employ two softmax heads, using one to subtract influence of the other!

Dec 2, 2024 • 12 tweets • 4 min read

A clear step towards achieving my dream: building AI that assists competitive programmers 🧑‍💻

“This is an exciting approach to combine work of human competitive programmers and LLMs, to achieve results that neither would achieve on their own.” --Petr Mitrichev

Details below! 🧵

There's been a rightful surge of AI-powered competitive programming systems, typically deployed on classical contests such as Codeforces.

While very impressive results have been achieved (ELO ~1,900), they are still significantly away from the highest percentiles of competitors.

Jun 7, 2024 • 8 tweets • 3 min read

Transformers need glasses! 👓

Read on to see how we expose fundamental weaknesses of decoder-only Transformers on important tasks (e.g. copying & counting) + simple ways to make things a bit easier on the Transformer :)

Work led by @fedzbar for his @GoogleDeepMind placement!

We start by asking a frontier LLM a simple query: copy the first & last token of bitstrings.

Not only does it fail past a certain length, it also fails in a very specific way: it fails when there's repetition (111...10), and it fails to copy the _last_ token, never the first.

Dec 12, 2022 • 5 tweets • 6 min read

If you are @LogConference, come to the virtual Poster Session in ~20 minutes -- we have _four_ posters on algorithmic alignment, reasoning and over-squashing in GNNs! 🕸️🍾🌐 Several of them are award-winning!

You're welcome to stop by for a chat. 😊
See the 🧵for details... 🔢

🌐 In "Reasoning-Modulated Representations", Matko Bošnjak, @thomaskipf, @AlexLerchner, @RaiaHadsell, Razvan Pascanu, @BlundellCharles and I demonstrate how to leverage arbitrary algorithmic priors for self-supervised learning. It even transfers _across_ different Atari games!

Jul 27, 2022 • 5 tweets • 6 min read

📢 New & improved material to dive into geometric deep learning! 💠🕸️

We (@mmbronstein @joanbruna @TacoCohen) delivered our Master's course on GDL @AIMS_Next once again & we make all materials publicly available!

geometricdeeplearning.com/lectures/

See thread 🧵 for gems 💎 & dragons 🐉!

What to expect in the 2022 iteration?

We made careful modifications to our content, making it more streamlined & accessible!

Featuring a revamped introductory lecture, clearer discussion of Transformers & a new lecture going beyond groups, into the realm of category theory! 🐲

Jun 2, 2022 • 10 tweets • 7 min read

Proud to share our CLRS benchmark: probing GNNs to execute 30 diverse algorithms! ⚡️

github.com/deepmind/clrs
arxiv.org/abs/2205.15659 (@icmlconf'22)

Find out all about our 2-year effort below! 🧵

w/ Adrià @davidmbudden @rpascanu @AndreaBanino Misha @RaiaHadsell @BlundellCharles

Why an algorithmic benchmark?

Algorithmic reasoning has emerged as a very important area of representation learning! Many key works (feat. @KeyuluXu @jingling_li @StefanieJegelka @beabevi_ @brunofmr) explored important theoretical and empirical aspects of algorithmic alignment.

Jun 1, 2022 • 11 tweets • 3 min read

Two years ago, I embarked on an 'engineering' project.

From my perspective (research scientist with 'decent' coding skill), it seemed simple enough. It turned out anything but.

In advance of celebrating our @icmlconf acceptance, an appreciation thread for AI engineering! 1/11

https://twitter.com/inoryy/status/1523354236473466882

Why did I class the project as simple at first?

It required no (apparent) novel research (though it could enable lots of new research!), I had the theoretical skills to understand everything that needs to be implemented, and it amounted to standard supervised learning! 2/11

Mar 9, 2022 • 4 tweets • 2 min read

This is a very cool paper!

However, if I understood it correctly, it doesn't invalidate the GNN-DP alignment result of @KeyuluXu et al. [33].

Rather, it shows a very interesting DP unsolvability result over arbitrarily-initialised features. See thread -- happy to discuss. 1/4

https://twitter.com/HamedSHassani/status/1501244879397003268

GNN _computations_ align with DP. If you initialise the node features _properly_ (e.g. identifying the source vertex):

r[s] = 1, r[u] = 0 (for u =/= s)
d[s] = 0, d[u] = -1

GNNs are then perfectly capable of finding shortest paths. The proof in the paper seems more subtle... 2/4

Jan 24, 2022 • 12 tweets • 11 min read

Geometric & Graph ML were a 2021 highlight, with exciting fundamental research & high-profile applications.

@mmbronstein and I interviewed distinguished experts to review this progress & predict 2022 trends. It's our longest post yet! See 🧵 for summary.

michael-bronstein.medium.com/predictions-an…

Trend #1: Geometry becomes increasingly important in ML. Quotes from Melanie Weber (@UniofOxford), @pimdehaan (@UvA_Amsterdam), @Francesco_dgv (@Twitter) and Aasa Feragen (@uni_copenhagen).

Dec 9, 2021 • 4 tweets • 4 min read

Attending @NeurIPSConf #NeurIPS2021 today?
Interested in algorithmic reasoning, implicit planning, knowledge transfer or bioinformatics?
We have 3 posters (1 spotlight!) in the poster session (4:30--6pm UK time) you might find interesting; consider stopping by! Details below: 🧵

(1) "Neural Algorithmic Reasoners are Implicit Planners" (Spotlight); with @andreeadeac22, Ognjen Milinković, @pierrelux, @tangjianpku & Mladen Nikolić.

Value Iteration-based implicit planner (XLVIN), which successfully breaks the algorithmic bottleneck & yields low-data gains.

Dec 1, 2021 • 8 tweets • 6 min read

Our work has been published in @Nature!!

(G)NNs can successfully guide the intuition of mathematicians & yield top-tier results -- in both representation theory & knot theory.

dpmd.ai/nature-maths
arxiv.org/abs/2111.15161
arxiv.org/abs/2111.15323

See my 🧵 for more insight...

https://twitter.com/DeepMind/status/1466080533050535940

It’s hard to overstate how happy I am to finally see this come together, after years of careful progress towards our aim -- demonstrating that AI can be the mathematician’s 'pocket calculator of the 21st century'.

I hope you’ll enjoy it as much as I had fun working on it!

Nov 5, 2021 • 7 tweets • 3 min read

Delighted to announce two papers we will present at #NeurIPS2021: on XLVIN (spotlight!), and on transferable algorithmic reasoning.

Both summarised in the wonderful linked thread from @andreeadeac22!

I'd like to add a few sentiments on XLVIN specifically... thread time! 🧵1/7

https://twitter.com/andreeadeac22/status/1456636063271821314

You might have seen XLVIN before -- we'd advertised it a few times, and it also featured at great length in my recent talks.

The catch? The original version of XLVIN has been doubly-rejected, from both ICLR (in spite of all-positive scores) and ICML. 2/7

https://twitter.com/PetarV_93/status/1321114783249272832

Nov 5, 2021 • 8 tweets • 3 min read

I've recently been asked the following question:

"Let's say I have two spare days and want to really understand GNNs. What should I do?"

My answers led me to revisit my old 'hints for GNN resources' in light of the new material I've (co)produced. See the thread for a summary! I'd say it is good to start with something a bit more theoretical, before diving into code. Specifically, I've been recommending my @Cambridge_CL talk on Theoretical GNN Foundations:

Why do I recommend this talk, specifically?

Jul 21, 2021 • 4 tweets • 3 min read

We release the full technical report & code for our OGB-LSC entry, in advance of our KDD Cup presentations! 🎉

arxiv.org/abs/2107.09422

See thread 🧵 for our insights gathered while deploying large-scale GNNs!

with @PeterWBattaglia @davidmbudden @andreeadeac22 @SibonLi et al.

For large-scale transductive node classification (MAG240M), we found it beneficial to treat subsampled patches bidirectionally, and go deeper than their diameter. Further, self-supervised learning becomes important at this scale. BGRL allowed training 10x longer w/o overfitting.

Jul 20, 2021 • 6 tweets • 4 min read

Delighted to share our work on reasoning-modulated representations! Contributed talk at @icmlconf SSL Workshop 🎉

arxiv.org/abs/2107.08881

Algo reasoning can help representation learning! See thread👇🧵

w/ Matko @thomaskipf @AlexLerchner @RaiaHadsell @rpascanu @BlundellCharles

We study a very common representation learning setting where we know *something* about our task's generative process. e.g. agents must obey some laws of physics, or a video game console manipulates certain RAM slots. However...

Jul 5, 2021 • 9 tweets • 4 min read

I firmly believe in giving back to the community I came from, as well as paying forward and making (geometric) deep learning more inclusive to underrepresented communities in general.

Accordingly, this summer you can (virtually) find me on several summer schools! A thread (1/9) At @EEMLcommunity 2021, I will give a lecture on graph neural networks from the ground up, followed by a GNN lab session led by @ni_jovanovic. I will also host a mentorship session with several aspiring mentees!

Based on 2020, I anticipate a recording will be available! (2/9)

Apr 28, 2021 • 5 tweets • 3 min read

Proud to share our 150-page "proto-book" with @mmbronstein @joanbruna @TacoCohen on geometric DL! Through the lens of symmetries and invariances, we attempt to distill "all you need to build the architectures that are all you need".

geometricdeeplearning.com

More info below! 🧵

We have investigated the essence of popular deep learning architectures (CNNs, GNNs, Transformers, LSTMs) and realised that, assuming a proper set of symmetries we would like to stay resistant to, they can all be expressed using a common geometric blueprint.

But there's more!

Apr 24, 2021 • 15 tweets • 5 min read

The crowd has spoken! 🙃 A thread with early-stage machine learning research advice follows below. 👇🧵

Important disclaimer before proceeding: these are my personal views only, and likely strongly biased by my experiences and temperament. Hopefully useful nonetheless! 1/15

https://twitter.com/PetarV_93/status/1385932158599114752

During the early stages of my PhD, one problem would often arise: I would come up with ideas that simply weren't the right kind of idea for the kind of hardware/software/expertise setup I had in my department. 2/15

Nov 16, 2020 • 14 tweets • 5 min read

Over the past weeks, several people have reached out to me for comment on "Combining Label Propagation and Simple Models Out-performs Graph Neural Networks" -- a very cool LabelProp-based baseline for graph representation learning. Here's a thread 👇 1/14

https://twitter.com/cHHillee/status/1323323061370724352

Firstly, I'd like to note that, in my opinion, this is a very strong and important work for representation learning on graphs. It provides us with so many lightweight baselines that often perform amazingly well -- on that, I strongly congratulate the authors! 2/14

Sep 17, 2020 • 8 tweets • 7 min read

As requested , here are a few non-exhaustive resources I'd recommend for getting started with Graph Neural Nets (GNNs), depending on what flavour of learning suits you best.

Covering blogs, talks, deep-dives, feeds, data, repositories, books and university courses! A thread 👇

For blogs, I'd recommend:
- @thomaskipf's post on Graph Convolutional Networks:
tkipf.github.io/graph-convolut…
- My blog on Graph Attention Networks:
petar-v.com/GAT/
- A series of comprehensive deep-dives from @mmbronstein: towardsdatascience.com/graph-deep-lea…

Share this page!

Enter URL or ID to Unroll