Discover and read the best of Twitter Threads about #blip2

Most recents (2)

Introducing 🔥InstructBLIP🔥 - our new Multimodal Foundation Models with instruction tuning on BLIP2, achieving new SOTA results on various VL benchmarks and enjoying various advantages over GPT-4.

Paper: arxiv.org/abs/2305.06500
Code: github.com/salesforce/LAV…
(1/n)
InstructBLIP unlocks a range of diverse multimodal capabilities for building next-generation AI agents, including complex visual scene understanding and reasoning, knowledge-grounded image description, multi-turn visual conversation, etc.

(2/n) Image
Built on the success of #BLIP2, InstructBLIP proposes a general instruction-tuning framework, where Q-Former extracts instruction-aware visual features from output embeddings of frozen image encoder, and feeds the visual features as soft prompt input to the frozen LLM.
(3/n) Image
Read 6 tweets
Experience the power of LLMs that understand images with the BLIP-2 live demo now running at 🤗 Huggingface Space! Check it out and share your favorite examples #BLIP2 #LLM!

Demo: huggingface.co/spaces/Salesfo…
Project Page: github.com/salesforce/LAV…
Useful tips when interacting with BLIP2:
1. Try prompting the model: "tell the story behind the photo:", "Question: what is the name of the person. Answer:";
2. Use proper punctuation with your inputs (?.:);
3. Try different generation options, e.g. sampling methods.
Read 4 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!