Latest Twitter Threads by @raphaelmilliere on Thread Reader App

Feb 17, 2024 • 15 tweets • 5 min read

There's a lot of speculation about whether OpenAI's video generation model Sora has a 'physics engine' (bolstered by OAI's own claims about 'world simulation'). Like the debate about world models in LLMs, this question is both genuinely interesting and somewhat ill-defined. 🧵1/

Of course it's widely unlikely that Sora literally makes function calls to an external physics engine like UE5 during inference. Note that this has been done before with LLMs, see this Google paper where the model answers questions through simulations with a physics engine. 2/

Apr 5, 2023 • 14 tweets • 4 min read

📝New preprint! What does it take for AI models to have grounded representations of lexical items? There is a lot of disagreement – some verbal, some substantive – about what grounding involves. Dimitri Mollo and I frame this old question in a new light 1/
arxiv.org/abs/2304.01481 Back in 1990, Harnad characterized the "Symbol Grounding Problem" with the following question: How can AI systems designed to process linguistic inputs have internal representations and outputs that are intrinsically meaningful? 2/
sciencedirect.com/science/articl…

Mar 24, 2023 • 13 tweets • 7 min read

Yann LeCun kicking off the debate with a bold prediction: nobody in their right mind will use autoregressive models 5 years from now #phildeeplearning

@ylecun closing his presentation with some conjectures #phildeeplearning

Mar 9, 2023 • 17 tweets • 7 min read

Another day, another opinion essay about ChatGPT in the @nytimes. This time, Noam Chomsky and colleagues weigh in on the shortcomings of language models. Unfortunately, this is not the nuanced discussion one could have hoped for. 🧵 1/

nytimes.com/2023/03/08/opi… For a start I'm not sure the melodramatic tone serves the argument: "machine learning will degrade our science and debase our ethics", and "we can only laugh or cry at [LLM's] popularity"! I know op-eds are often editorialized for dramatic effect, but maybe this is a bit much? 2/

Feb 10, 2023 • 8 tweets • 3 min read

I don't think lossy compression is a very helpful analogy to convey what (linguistic or multimodal) generative models do – at least if "blurry JPEGs" is the leading metaphor. It might work in a loose sense, but it doesn't tell the whole story. 1/

newyorker.com/tech/annals-of… Generative models can definitely be used for lossy compression (see below), but that's a special case of their generative capabilities. Reducing all they do to LC perpetuates the idea that they just regurgitate approximations of their training samples. 2/

web.archive.org/web/2022092100…

Aug 9, 2022 • 12 tweets • 5 min read

Can you reliably get image generation models like DALL-E 2 to illustrate specific visual concepts using made-up words? In this new preprint, I show that you can, using new approaches for text-based adversarial attacks on image generation. 1/12

arxiv.org/abs/2208.04135 Image generation models are typically trained on multilingual datasets (even accidentally). The paper introduces "macaronic prompting", a method to concatenate chunks from synonymous words in multiple languages to design nonce strings that can reliably query visual concepts. 2/12

Jul 2, 2022 • 14 tweets • 5 min read

Are large pre-trained models nothing more than stochastic parrots? Is scaling them all we need to bridge the gap between humans and machines? In this new opinion piece for @NautilusMag, I argue that the answer lies somewhere in between. 1/14

nautil.us/moving-beyond-… While LPT models are undeniably impressive, many researchers have rightfully warned that we shouldn't jump to conclusions about how similar they are to human cognition. The recent LaMDA story is yet another cautionary tale about our natural tendency for anthropomorphism. 2/14

Jun 22, 2022 • 10 tweets • 4 min read

Here we go again! Parti, a new text-to-image model from @GoogleAI, drops contrastive learning and diffusion in favor of good old seq-to-seq autoregression. Results shared in the paper seem state-of-the-art for complex compositional prompts, although some failure modes remain.

https://twitter.com/lmthang/status/1539664610596225024

@GoogleAI "A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!"

Jun 22, 2022 • 5 tweets • 1 min read

This is a nice and fairly exhaustive overview of the potential harms of LLMs, but in my opinion it's still missing a more indirect risk that pertains to human-human online interactions. 1/

https://twitter.com/Dr_Atoosa/status/1539136439979544577

As LLMs become easier and cheaper to use for companies and individuals, LLM-based (chat)bots will become commonplace online. I worry that this could eventually threaten to degrade human communication, by making people increasingly suspicious that they are talking to machines. 2/

Jun 13, 2022 • 18 tweets • 5 min read

I've seen a lot of discussion about the notion of artificial general intelligence (AGI) lately – what it means, if anything. Like many debates regarding AI, this is a topic that invites verbal disputes if people don't have the same definitions in mind. 1/18 Many AI researchers, including @ylecun and @fchollet , have argued that there's no such thing as AGI. I take it that they mean something along the following lines: there is no such thing as universal or maximally general intelligence (artificial or not). 2/18

Jun 2, 2022 • 4 tweets • 3 min read

Maybe someone had already tried this, but "Apoploe vesrreaitais" robustly yields images of seashells in DALL·E mini (100% match out of 36 trials)

https://twitter.com/giannis_daras/status/1531693093040230402

"Wa ch zod rea" yields specific dogs

May 31, 2022 • 14 tweets • 10 min read

Inspired by @giannis_daras & @AlexGDimakis's exploratory study of strange multimodal idioms in #dalle, I conducted a little experiment, starting with the prompt "Two people talking about birds, with subtitles". Let's see if we can find out what they're saying! 1/14

First, I prompted DALL-E with "Bonabiss is bobor ine is ros and in beors witches" a few times. Perplexing - something about bugs, fruits, and witches? The latter is hardly surprising given its presence in the prompt, but seems out of place. 2/14

May 30, 2022 • 50 tweets • 46 min read

Philosophy thought experiments illustrated with #dalle, a thread 🎨 Mary the color scientist (w.wiki/4wdA)

May 25, 2022 • 6 tweets • 2 min read

Go read this excellent and timely blog post on compositionality and vision-language models. I share the positive sentiment towards recent progress in this area, with some caveats about remaining hurdles. 1/6

https://twitter.com/FelixHill84/status/1529361172922867713

I disagree that "it makes no sense to criticise DALL-E (or neural networks in general) for their poor composition", if that simply means pointing out current limitations. I also emphasized DALL-E's strengths, but it clearly struggles with some forms of compositionality. 2/6

May 24, 2022 • 11 tweets • 6 min read

With the release of #Imagen from @GoogleAI yesterday, here's a quick follow-up thread on the progress of compositionality in vision-language models.🧵 1/11

A few weeks ago DALL-E 2 was unveiled. It exhibits both very impressive success cases and clear failure cases – especially when it comes to counting, relative position, and some forms of variable binding. Why?

https://twitter.com/raphaelmilliere/status/1514619721341145091

2/11

May 17, 2022 • 10 tweets • 3 min read

It's become increasingly clear over the past few weeks that the Overton window for "scaling maximalism" – the claim that scaling existing approaches is all we need to build AGI – has shifted, at least in the industry. Some thoughts on this 🧵 1/ Scaling maximalism used to be viewed as a strawman for critics of deep learning like @GaryMarcus. Lately prominent researchers have explicitly endorsed the view (e.g., @NandoDF: "It’s all about scale now! The Game is Over!"; @AlexGDimakis: "Scale is all you need"). 2/

Apr 14, 2022 • 26 tweets • 13 min read

The release of impressive new deep learning models in the past few weeks, notably #dalle2 from @OpenAI and #PaLM from @GoogleAI, has prompted a heated discussion of @GaryMarcus's claim that DL is "hitting a wall". Here are some thoughts on the controversy du jour. 🧵 1/25 One of @GaryMarcus' central claims is that current DL models fail at compositionality. The assessment of this claim is complicated by the fact that people may differ in how they understand compositionality – and what a "test of compositionality" should even look like. 2/25

Jan 15, 2021 • 19 tweets • 7 min read

There is an increasing awareness that digital privacy matters even you don't have "anything to hide". I've been vocal about this for a while but people often don't know where to begin. The recent WhatsApp controversy is a good opportunity for a 🧵 with a few privacy tips. 1/n First things first: assuming you're not breaking the law, why should you care? Ask yourself: Would you be fine with a company monitoring your home 24/7 with a surveillance camera? What about someone watching you through a window with binoculars? Probably not. 2/n

Jul 31, 2020 • 11 tweets • 3 min read

I've seen some questions about how I could produce the texts I shared earlier by prompting GPT-3, and whether GPT-3 is capable of producing such a convincing output at all, so here's a thread to clarify a few points. My methodology was the following. Since I don't yet have access to the API, I used @AiDungeon with the "Dragon" model (which is GPT-3) and a custom prompt. AFAIK, AID allows for arbitrarily large prompts, but as @MaCroPhilosophy pointed out these must be automatically truncated.

Jul 31, 2020 • 5 tweets • 4 min read

I asked GPT-3 to write a response to the philosophical essays written about it by @DrZimmermann, @rinireg @ShannonVallor, @add_hawk, @AmandaAskell, @dioscuri, David Chalmers, Carlos Montemayor, and Justin Khoo published yesterday by @DailyNousEditor. It's quite remarkable!

The prompt contained the essays themselves, plus a blurb explaining that GPT-3 had to respond to them. Full disclosure: I produced a few outputs and cherry-picked this one, although they were all interesting in their own way.

Share this page!

Enter URL or ID to Unroll