Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Steven Hoi

@stevenhoi

May 12, 2023 • 6 tweets • 4 min read • Read on X

Scrolly

Introducing 🔥InstructBLIP🔥 - our new Multimodal Foundation Models with instruction tuning on BLIP2, achieving new SOTA results on various VL benchmarks and enjoying various advantages over GPT-4.

Paper: arxiv.org/abs/2305.06500
Code: github.com/salesforce/LAV…
(1/n)

InstructBLIP unlocks a range of diverse multimodal capabilities for building next-generation AI agents, including complex visual scene understanding and reasoning, knowledge-grounded image description, multi-turn visual conversation, etc.

(2/n)

Built on the success of #BLIP2, InstructBLIP proposes a general instruction-tuning framework, where Q-Former extracts instruction-aware visual features from output embeddings of frozen image encoder, and feeds the visual features as soft prompt input to the frozen LLM.
(3/n)

#InstructBLIP consistently improves our prior #BLIP2 models and significantly outperforms Deepmind’s #Flamingo-80B of much bigger model sizes on a variety of benchmarks for zero-shot vision and language tasks.

(4/n)

InstructBLIP aims to address the fundamental challenges in vision-language instruction tuning and conduct a systematic study with a comprehensive set of datasets and tasks for improving the models’ generalization ability to unseen data and tasks.

(5/n)

@Wenliang_Dai

Find out more from our paper: arxiv.org/abs/2305.06500
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

Great joint work w/ our intern @Wenliang_Dai and our amazing AI team @LiJunnan0409 @DongxuLi_
at @SFResearch and our collaborators.
(6/n)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @stevenhoi

Steven Hoi

@stevenhoi

Jun 2, 2023

📢Introducing 🔥#CodeTF🔥, a one-stop Transformer Library for Code Large Language Models (CodeLLM), with a unified interface for training & inference on code tasks (code generation,summarization,translation,etc)

Paper: arxiv.org/abs/2306.00029
Code: github.com/salesforce/Cod…

(1/n)

CodeTF library supports both the development and deployment of Code LLMs for code intelligence tasks. The library can support both training and serving code LLM models, code utilities to process code data, and popular research benchmarks to evaluate the model performance.
(2/n)

CodeTF is designed with key principles to provide a user-friendly and easy-to-use platform for code intelligence tasks. It follows a modular architecture, enhancing its extensibility by allowing seamless integration of additional programming languages, models & utilities.

(3/n)

Read 5 tweets

Steven Hoi

@stevenhoi

May 25, 2023

Introducing 🔥BLIP-Diffusion🔥, a novel method for enabling Text-to-image Diffusion models with multimodal controllable generation/editing, powered by BLIP-2 pre-trained text-aligned subject representation.

Paper: arxiv.org/abs/2305.14720
Project: dxli94.github.io/BLIP-Diffusion…

(1/n)

BLIP-Diffusion learns pretrained subject representation to unlock a range of zero-shot/few-step-tuned image generation and editing capabilities, e.g., subject-driven generation, zero-shot subject-driven image manipulation, controllable subject-driven image editing, etc.

(2/n)

Two-stage pretraining strategy: 1) multimodal representation learning with BLIP-2 to produce text-aligned visual features for an input image; 2) subject representation learning trains the Diffusion models to use the features by BLIP-2 to generate novel subject renditions.

(3/n)

Read 9 tweets

Steven Hoi

@stevenhoi

May 16, 2023

Introducing 🔥CodeT5+🔥, a new family of open-source code LLMs for both code understanding and generation, achieved new SoTA code generation performance on HumanEval, surpassing all the open-source code LLMs.

Paper: arxiv.org/pdf/2305.07922…
Code: github.com/salesforce/Cod…

(1/n)

CodeT5+ proposes a flexible model architecture of encoder-decoder with a mixture of varied pretraining tasks, which can flexibly operate in different modes (i.e., encoder-only, decoder-only, and encoder-decoder) for a wide range of code understanding and generation tasks.

(2/n)

The family of CodeT5+ models was trained on permissively-licensed code and with model sizes ranging from 220M to 16B, and can be initialized from frozen off-the-shelf LLMs (e.g., CodeGen or any GPT-type model) to efficiently train large models by saving huge compute cost.

(3/n)

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Steven Hoi

Try unrolling a thread yourself!

More from @stevenhoi

Steven Hoi

Steven Hoi

Steven Hoi

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!