1. Generate Picture Books with AI for free (code open-source👇) with @OpenAI Function Calling, @LangChainAI, #DeepLake, & @StabilityAI.
- Prompt -> a PDF storybook with illustrations.
- Stores image & text pairs in the multimodal #DeepLake VectorDB for model training/finetuning!
2. Read the 🧵 to learn how @OpenAI Function Calling & @LangChainAI helped.
FableForge is built by @ethanjdev & handles:
1. Prompt -> text & images
2. PDF creation
3. Deep Lake DB to view the multimodal images + text dataset or stream it in real-time to train/fine-tune an LLM.
3. But first... What's @OpenAI's function calling update?
In essence, it's bridging the gap between unstructured language input and structured, actionable output that other systems, tools, or services can use.
4. The chat models can now detect if a function needs to be called based on the user's input and respond with JSON that conforms to the described function's signature.
5. Effectively, you now can:
- Create QA chatbots with external tools (e.g., Plugins)
- Extract structured data from text
- Convert natural language into API calls or database queries (this is what we've used).
6. The first approach of instructing the language model to generate prompts didn't work, since Stable Diffusion was released in 2022, and teaching #GPT4 how to properly prompt was difficult.
7. For our prompts, we need structured data to adhere to specific rules. Here's one of the functions we've used.
We'll send the chat model a page from our book, the function, & instructions to infer the details from the page. In return, we get structured data to form a prompt!
8. @LangChainAI has recently added even better support for using functions (attend this webinar to learn more!)
9. Once we obtain the prompts, we then generate the texts and images with #StableDiffusion and #GPT4. This part of the code takes care of the following steps:
10. Then we put the PDF together.
1. Text Addition and Image Conversion
2. Cover Generation
3. PDF Assembly
11. Now that we have finalized our picture book, we want to store the images and prompts in Deep Lake. Deep Lake is multimodal, which means we can store embeddings, images, text, etc - all within one 'database'. This allows for some cool stuff -
12. For instance - visualizing the image and text pairs, as well as streaming the entire dataset for further fine-tuning based on user feedback!
13. (13/14) Give the open-source repo a star! github.com/e-johnstonn/Fa…. Also, accepting PRs for local models, user feedback buttons in the UI, etc.
14. (14/14) Full writeup here:
activeloop.ai/resources/ai-s…
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.