Fun experiment: you can now automate content production with Claude Cowork.
I gave Claude a browser and asked it to: 1) Find a NeurIPS best paper 2) Write a thread on it 3) Generate graphics on Krea 4) Validate its work with ChatGPT.
Shockingly, it worked! How to do it 👇
You start in the "Cowork" tab in the desktop app.
Claude will open a browser and search, so you can give a vague or very specific prompt (like I did) and it will find context.
I mentioned I wanted image gen on Krea because I got sick of watching Claude search random sites 😂
In terms of the actual written content - answer Claude's Qs about who your audience is + how long it should be.
This will heavily influence results!
It will then browse the relevant sources and open up an internal scratchpad to save down insights and start drafting the thread.
My favorite paper this year: "Video models are zero-shot learners and reasoners"
It illustrates that video models show emergent visual reasoning at scale - they can solve vision tasks they weren't trained for.
This may be the "GPT moment" for vision. Let's break it down 👇
To start - why believe that video models might develop visual reasoning?
A similar thing happened in text. We used to train specific models for each task - but now, LLMs have general language understanding and can tackle lots of tasks that they weren't explicitly trained for.
It's feasible that video models may do the same at scale.
This paper measured 18k+ videos generated by Veo 3 across both qualitative and quantitative tasks.
It found that Veo can perceive, modify, and manipulate the visual world (starting from image + text prompts) - showcasing early reasoning skills that it wasn't explicitly trained for.