Post

More from @runwayml

Runway

@runwayml

Sep 24

Today we're sharing our first research work exploring diffusion for language models: Autoregressive-to-Diffusion Vision Language Models

We develop a state-of-the-art diffusion vision language model, Autoregressive-to-Diffusion (A2D), by adapting an existing autoregressive vision language model for parallel diffusion decoding. Our approach makes it easy to unlock the speed-quality trade-off of diffusion language models without training from scratch, by leveraging existing pre-trained autoregressive models.

Standard Vision-language models (VLMs) reason about images and videos through language, powering a wide variety of applications from image captioning to visual question answering.

Autoregressive VLMs generate tokens sequentially, which prevents parallelization and limits inference throughput. Diffusion decoders are emerging as a promising alternative to autoregressive decoders in VLMs by enabling parallel token generation for faster inference.

We trained a state-of-the-art diffusion VLM, A2D-VL 7B for parallel generation by finetuning an existing autoregressive VLM on the diffusion language modeling task, using the masked diffusion framework which "noises" tokens by masking them and "de-noises" tokens by predicting the original tokens.

We develop novel adaptation techniques that gradually increase the task difficulty during finetuning to smoothly transition from sequential to parallel decoding while still preserving the base model's capabilities, by annealing both the block size and the noise level.

Read 5 tweets

Runway

@runwayml

Aug 27

Runway Aleph is a new way to edit, transform and generate video. Its ability to perform a wide range of generalized tasks means it can reimagine ordinary footage in endless new ways. Allowing you to turn images and videos you already have into anything you want.

See below for a quick breakdown on how Aleph can effortlessly remove the subject from these scenes, just by asking it to.

To remove the subject, just ask Aleph to “remove the man”.

Aleph can retain complex scenes and fine details without the need for tedious masking.

Read 5 tweets

Runway

@runwayml

Jan 17

Today we are releasing Frames. Our most advanced base model for image generation, offering unprecedented stylistic control and visual fidelity. Learn more below.

(1/10)

With Frames, you can begin to define worlds that represent your own artistic points of view. Styles, compositions, subject matter and more. Anything you can imagine, you can begin to bring to life with Frames.

(2/10)

(3/10)

Read 10 tweets

Runway

@runwayml

Dec 2, 2024

Today we’re sharing an early video keyframing prototype that treats creative exploration like a search process of all latent artistic possibilities. One which allows you to simultaneously navigate this vast space with both precise control as well as serendipitous nonlinear discovery.

(1/8)

Graph Structure: A Window in Latent Space

The Graph structure is the foundation of the prototype. Images are represented as nodes, serving as waypoints in the model's latent space. These nodes can be connected to other nodes to create an edge; a video that transitions from the first frame to the last frame across latent space and time.

(2/8)

Balancing Control and Serendipity

Precise controls help limit the vast space of possibilities, but at the same time, variation and unpredictability can result in "happy accidents"–possibilities that we would not have considered given precise control. To balance this tradeoff, we provide two affordances for users to manipulate images in a "relational" manner that allows unpredictability in consistent dimensions.

(3/8)

Read 8 tweets

Runway

@runwayml

Nov 25, 2024

Introducing Frames: An image generation model offering unprecedented stylistic control.

Frames is our newest foundation model for image generation, marking a big step forward in stylistic control and visual fidelity. With Frames, you can begin to architect worlds that represent very specific points of view and aesthetic characteristics.

See below for examples.

World 1089: Mise-en-scène

(1/11)

Frames allows you to design with precision the look, feel and atmosphere of the world you want to create.

World 3190: 1980s SFX Makeup

(2/11)

World 3204: 1970s Album Art

(3/11)

Read 11 tweets

Runway

@runwayml

Nov 22, 2024

Introducing, Expand Video.

This new feature allows you to transform videos into new aspect ratios by generating new areas around your input video. Expand Video has begun gradually rolling out and will soon be available to everyone.

See below for more examples and results.

(1/6)

Use Expand Video to help shape your story. Seamlessly extend your frame beyond its original boundaries while maintaining visual consistency to create stories with new compositions.

(2/6)