Post

@ylecun

@BrownCSDept

@LakeBrenden

@LakeBrenden

@davidchalmers42

@Jake_Browning00

@glupyan

@glupyan

More from @raphaelmilliere

Raphaël Millière

@raphaelmilliere

Feb 17, 2024

There's a lot of speculation about whether OpenAI's video generation model Sora has a 'physics engine' (bolstered by OAI's own claims about 'world simulation'). Like the debate about world models in LLMs, this question is both genuinely interesting and somewhat ill-defined. 🧵1/

Of course it's widely unlikely that Sora literally makes function calls to an external physics engine like UE5 during inference. Note that this has been done before with LLMs, see this Google paper where the model answers questions through simulations with a physics engine. 2/

But that's not what most people are speculating about. Rather, the idea is that Sora would acquire an internal model of physics during training, and make use of this internal model to generate temporally and spatially coherent videos. 3/

Read 15 tweets

Raphaël Millière

@raphaelmilliere

Apr 5, 2023

📝New preprint! What does it take for AI models to have grounded representations of lexical items? There is a lot of disagreement – some verbal, some substantive – about what grounding involves. Dimitri Mollo and I frame this old question in a new light 1/
arxiv.org/abs/2304.01481

Back in 1990, Harnad characterized the "Symbol Grounding Problem" with the following question: How can AI systems designed to process linguistic inputs have internal representations and outputs that are intrinsically meaningful? 2/
sciencedirect.com/science/articl…

Harnad asked this question about classical AI systems manipulating symbols with arbitrary shapes. An analogous issue arises for neural nets, like language models, that compute over vectors rather than symbols: we call it the Vector Grounding Problem as a nod to Harnad's work. 3/

Read 14 tweets

Raphaël Millière

@raphaelmilliere

Mar 9, 2023

@nytimes

Another day, another opinion essay about ChatGPT in the @nytimes. This time, Noam Chomsky and colleagues weigh in on the shortcomings of language models. Unfortunately, this is not the nuanced discussion one could have hoped for. 🧵 1/

nytimes.com/2023/03/08/opi…

For a start I'm not sure the melodramatic tone serves the argument: "machine learning will degrade our science and debase our ethics", and "we can only laugh or cry at [LLM's] popularity"! I know op-eds are often editorialized for dramatic effect, but maybe this is a bit much? 2/

The substantive claims are all too familiar: LLMs learn from co-occurrence statistics without leveraging innate structure; they describe and predict instead of doing causal inference; and they can't balance original reasoning with epistemic and moral constraints. 3/

Read 17 tweets

Raphaël Millière

@raphaelmilliere

Feb 10, 2023

I don't think lossy compression is a very helpful analogy to convey what (linguistic or multimodal) generative models do – at least if "blurry JPEGs" is the leading metaphor. It might work in a loose sense, but it doesn't tell the whole story. 1/

newyorker.com/tech/annals-of…

Generative models can definitely be used for lossy compression (see below), but that's a special case of their generative capabilities. Reducing all they do to LC perpetuates the idea that they just regurgitate approximations of their training samples. 2/

web.archive.org/web/2022092100…

This bit about interpolation strikes me as particularly misleading. Inference on generative models involves computations that are way more complex and structured than (say) nearest neighbor pixel interpolation in image decompression. 3/

Read 8 tweets

Raphaël Millière

@raphaelmilliere

Aug 9, 2022

Can you reliably get image generation models like DALL-E 2 to illustrate specific visual concepts using made-up words? In this new preprint, I show that you can, using new approaches for text-based adversarial attacks on image generation. 1/12

arxiv.org/abs/2208.04135

Image generation models are typically trained on multilingual datasets (even accidentally). The paper introduces "macaronic prompting", a method to concatenate chunks from synonymous words in multiple languages to design nonce strings that can reliably query visual concepts. 2/12

For example, the word for “birds” is “Vögel” in German, “uccelli” in Italian, “oiseaux” in French, and “pájaros” in Spanish. Concatenate subword tokens from these words and you get strings like “uccoisegeljaros”, which reliably prompt DALL-E to generate images of birds. 3/12

Read 12 tweets

Raphaël Millière

@raphaelmilliere

Jul 2, 2022

@NautilusMag

Are large pre-trained models nothing more than stochastic parrots? Is scaling them all we need to bridge the gap between humans and machines? In this new opinion piece for @NautilusMag, I argue that the answer lies somewhere in between. 1/14

nautil.us/moving-beyond-…

While LPT models are undeniably impressive, many researchers have rightfully warned that we shouldn't jump to conclusions about how similar they are to human cognition. The recent LaMDA story is yet another cautionary tale about our natural tendency for anthropomorphism. 2/14

@emilymbender

The influential "Stochastic Parrots" paper by @emilymbender, @timnitGebru, @mcmillan_majora & @mmitchell_ai eloquently makes this point, also emphasized time and time again by other skeptical voices such as @GaryMarcus. 3/14

dl.acm.org/doi/10.1145/34…

Read 14 tweets

Share this page!

Enter URL or ID to Unroll

Raphaël Millière

Try unrolling a thread yourself!

More from @raphaelmilliere

Raphaël Millière

Raphaël Millière

Raphaël Millière

Raphaël Millière

Raphaël Millière

Raphaël Millière

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!