Post

@thomaskipf

@xbresson

@SergeyI49013776

@weihua916

@rusty1s

@williamleif

More from @PetarV_93

Petar Veličković

@PetarV_93

Jan 2

https://twitter.com/ytz2024/status/1874695567198265787

Recently, @ytz2024 (co-lead of Differential Transformers) provided an insightful analysis of whether their work escapes the constraints of "softmax is not enough".

My thoughts follow below! tl;dr: differentials fix the issue in theory, but I'm not convinced they do in practice.

https://twitter.com/ytz2024/status/1874695567198265787

For starters, a quick primer:

* We prove attentional coefficients of _all_ global softmax heads _must_ disperse at increasing problem sizes, rendering them incapable of sharp reasoning.
* Diff Transformers employ two softmax heads, using one to subtract influence of the other!

I'm very interested in differentials, as they offer an intuitive way to escape the confines of our theory: if one head discards "irrelevant" coefficients of the other, we effectively reduce the problem size, and sharpness is regained. I believe @ytz2024 made the same argument.

Read 15 tweets

Petar Veličković

@PetarV_93

Dec 2, 2024

A clear step towards achieving my dream: building AI that assists competitive programmers 🧑‍💻

“This is an exciting approach to combine work of human competitive programmers and LLMs, to achieve results that neither would achieve on their own.” --Petr Mitrichev

Details below! 🧵

There's been a rightful surge of AI-powered competitive programming systems, typically deployed on classical contests such as Codeforces.

While very impressive results have been achieved (ELO ~1,900), they are still significantly away from the highest percentiles of competitors.

In contrast, combinatorial tasks present a very different challenge: tasks are intractable (NP-hard).

This levels the playing field: humans will not know the optimal solution, and suboptimal solutions score >0 points. This allows AI to explore in a way that complements humans 🔎

Read 12 tweets

Petar Veličković

@PetarV_93

Jun 7, 2024

Transformers need glasses! 👓

Read on to see how we expose fundamental weaknesses of decoder-only Transformers on important tasks (e.g. copying & counting) + simple ways to make things a bit easier on the Transformer :)

Work led by @fedzbar for his @GoogleDeepMind placement!

We start by asking a frontier LLM a simple query: copy the first & last token of bitstrings.

Not only does it fail past a certain length, it also fails in a very specific way: it fails when there's repetition (111...10), and it fails to copy the _last_ token, never the first.

This leads to our first result -- representational collapse.

We prove there must exist pairs of different inputs for which their last token representations cannot be distinguished.

To prove this, we use bitstrings of the form 11...10, where repetitions exacerbate the problem.

Read 8 tweets

Petar Veličković

@PetarV_93

Dec 12, 2022

@LogConference

If you are @LogConference, come to the virtual Poster Session in ~20 minutes -- we have _four_ posters on algorithmic alignment, reasoning and over-squashing in GNNs! 🕸️🍾🌐 Several of them are award-winning!

You're welcome to stop by for a chat. 😊
See the 🧵for details... 🔢

@thomaskipf

🌐 In "Reasoning-Modulated Representations", Matko Bošnjak, @thomaskipf, @AlexLerchner, @RaiaHadsell, Razvan Pascanu, @BlundellCharles and I demonstrate how to leverage arbitrary algorithmic priors for self-supervised learning. It even transfers _across_ different Atari games!

@heyu0208

🤖 In "Continuous Neural Algorithmic Planners", @heyu0208, @pl219_Cambridge, @andreeadeac22 and I show how the ideas from XLVIN paper can generalise to continuous-action-space environments (such as MuJoCo!). CNAP won the Best Paper Runner-up Award at GroundedML @ ICLR'22!

Read 5 tweets

Petar Veličković

@PetarV_93

Jul 27, 2022

@mmbronstein

📢 New & improved material to dive into geometric deep learning! 💠🕸️

We (@mmbronstein @joanbruna @TacoCohen) delivered our Master's course on GDL @AIMS_Next once again & we make all materials publicly available!

geometricdeeplearning.com/lectures/

See thread 🧵 for gems 💎 & dragons 🐉!

What to expect in the 2022 iteration?

We made careful modifications to our content, making it more streamlined & accessible!

Featuring a revamped introductory lecture, clearer discussion of Transformers & a new lecture going beyond groups, into the realm of category theory! 🐲

@Francesco_dgv

Beyond this, we offer a completely revamped set of exciting guest seminars, with @Francesco_dgv @ffabffrasca @crisbodnar @Russb09 & Geordie Williamson...

...and Colab tutorials on GDL from @crisbodnar @DutaIulia @paulmorio @_gabrielecesa_ @charlieharris01 @chaitjo & Ramon Viñas!

Read 5 tweets

Petar Veličković

@PetarV_93

Jun 2, 2022

@icmlconf

Proud to share our CLRS benchmark: probing GNNs to execute 30 diverse algorithms! ⚡️

github.com/deepmind/clrs
arxiv.org/abs/2205.15659 (@icmlconf'22)

Find out all about our 2-year effort below! 🧵

w/ Adrià @davidmbudden @rpascanu @AndreaBanino Misha @RaiaHadsell @BlundellCharles

@KeyuluXu

Why an algorithmic benchmark?

Algorithmic reasoning has emerged as a very important area of representation learning! Many key works (feat. @KeyuluXu @jingling_li @StefanieJegelka @beabevi_ @brunofmr) explored important theoretical and empirical aspects of algorithmic alignment.

Critically, each one of these works (incl. mine!) operates over its own datasets, often making it hard to directly compare insight among papers.

Further, generating adequate datasets requires knowledge of theoretical computer science, raising barrier of entry to the field.

Read 10 tweets

Share this page!

Enter URL or ID to Unroll

Petar Veličković

Try unrolling a thread yourself!

More from @PetarV_93

Petar Veličković

Petar Veličković

Petar Veličković

Petar Veličković

Petar Veličković

Petar Veličković

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!