aifunhouse Profile picture
Sep 30 β€’ 19 tweets β€’ 6 min read
It's Friday and that means it's time for the @aifunhouse Week in Review!

As always, it's been a wild week in #AI!

DreamBooth, Instant NeRF, Make a Video, and more ... let's get in!

πŸ€–πŸ§΅πŸ‘‡πŸ½
1. First up, DreamBooth, a technique from Google Research originally applied to their tool Imagen, but generalizable to other models, allows for fine-tuning of text-to-image networks to allow generation of consistent characters across contexts and styles. dreambooth.github.io Image
2. The results are wild - take a look:
3. Here's our thread from earlier this week:
4. Want to dig into DreamBooth for your own (likely questionable) purposes? Github and Colabs yonder:
5. Next - in a rare departure from Large Language Models and text-to-image, NVidia's Instant NGP with instant NeRF dramatically reduces the amount of time required to infer 3D scenes from a 2D images. Think about this as uber-photogrammetry. github.com/NVlabs/instant…
6. Great how-to on getting that Instant NGP installed, compiled, and running here:
developer.nvidia.com/blog/getting-s…
7. Additional tools in the repo allow for mesh generation, SDF, gigapixel image approximation, volume rendering, camera moves, interactive rendering with multisample DoF, slicing, and rad visualizations of what's happening under the hood in the neural net.
8. Next up, coming in hot from Meta AI is Make-A-Video, a paper and perhaps? a set of hosted tools (sign up if you're interested - shocker thanks Zuck πŸ™„) capable of text-to-video, image tweening, and video variation creation with pretty decent results.
makeavideo.studio
9. Subjectively, the Make-A-Video output's quality is reminiscent of GAN image output ~2 years ago, which is in no way a small feat. The images are stable, have decent detail and resolution, and plausible lighting and subjects.
10. They do have some GAN-like undesirable artifacts as well, including harsh edges, lack of definition in detailed areas, and a crushed color palette. Lots of room for improvement but impressive set of early results in what is sure to be the next frontier for image generation.
11. Bonus sample from Make-A-Video: "A golden retriever eating ice cream on a beautiful tropical beach at sunset, high resolution"
12. Next up - in a similar vein to DreamBooth we have Re-Imagen, which acheives now SoTA results in image retrieval for "even for rare or unseen entities", even with the challenging COCO and WikiImages datasets:
13. According to the paper, Re-Imagen outperforms StableDiffusion and DALL-E 2 in terms of faithfulness and photorealism with human raters, mostly in low-frequency entities.
14. You can think of this as a one-shot learning implementation of the sort of thing DreamBooth and Textual Inversion are capable of, and the results are indeed impressive. Image
15. That's it for the @aifunhouse Week in Review!

What were your favorite announcements, demos, or papers this week? What did we miss?
16. Did you love this thread? Of course you did! You're no dummy and you have a nice smile!

Follow @aifunhouse for more tips, tutorials, tricks, and roundups from this Cambrian Explosion of AI crazy!
17. RT this thread to let your friends know where the best roundups can be found (it's here BTW).

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with aifunhouse

aifunhouse Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @aifunhouse

Oct 2
1. Hey there young Spielbergs!

Curious about how AI can be used for film making?

It's still early days, but between text-to-image, text-to-audio and AI-driven animation, building blocks are starting to appear.

Today's thread provides an overview.
πŸ§΅πŸ‘‡ ImageImageImage
2. First off, some highlights!

Way back in July @mrjonfinger used @midjourney to produce a coherent short film.

Very solid visuals, but the voices and animation are a bit stilted. We had to rewatch to grok the plot, but it's 100% there once you get it.

3. Another early AI success story is @adampickard's use of DALL-E's to recreate the famous short film "Powers of Ten" by Ray and Charles Eames.

There's no dialog here, but the narrative of the original definitely comes through.

Read 18 tweets
Oct 1
1. Last week @OpenAI finally removed the waitlist for DALL-E!

Of the AI image creation tools out there, DALL-E is arguably the most accessible.

Today's thread provides a hands-on tutorial for new DALL-E users looking to generate visually consistent assets.

Sticker Time! ImageImage
2. If you've never used DALL-E, head on over and sign up now!

First announced in April, DALL-E is the OG text-to-image ML model. For months, access was extremely limited, but now everyone, including you, can log in and take this dream machine for a spin.

labs.openai.com
3. If you need inspiration, OpenAI's Instagram is filled with images that a mere 6 month ago would have knocked our collective socks off.

Since then of course, we've all become jaded as new mind-bending tech comes out weekly.

Ah to be young again.

instagram.com/openaidalle/?h… Image
Read 15 tweets
Sep 27
1. Last week @OpenAI released Whisper, an open source model for transcribing audio.

Let’s see how you can use Whisper + GPT-3 to quickly summarize text-heavy YouTube vids.

If you're new to ML this is a great tutorial to get hands on and play along.

openai.com/blog/whisper/

🧡
2. First we'll to use Whisper to transcribe a video we want to summarize.

Whisper is open source and there are already multiple UIs on the web that allow you run it on video and audio.

For YouTube videos, @jeffistyping created a super simple UI.

huggingface.co/spaces/jeffist…
3. To transcribe a video, just paste in its YouTube URL and wait for the title and preview image to appear.

In this example, we chose a 10-minute video of @garrytan discussing Adobe's recent $20B acquisition of Figma. Image
Read 14 tweets
Sep 26
Text-to-image networks generate amazing imagery, but out of the box, it's hard to define characters and styles that remain consistent across outputs.

The crew at @CorridorDigital has a great video showcasing how tools like DreamBooth can solve this.
πŸ§΅πŸ‘‡
DALL-E and Stable Diffusion (SD) have taken Twitter by storm because of the high quality visuals they generate from text prompts.

Within a prompt, you can refer to well-known people/objects (e.g. "Eiffel Tower") and place them in specific situations, or stylize them in new ways. ImageImageImageImage
Sadly, if you want to create an image with a not-so-well-known object (e.g. you), you're out of luck.

You can try to describe the object, but this won't cut it for specific people or products. It also won't result in great consistency across images.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(