We just released Transformers' boldest feature: Transformers Agents.
This removes the barrier of entry to machine learning
Control 100,000+ HF models by talking to Transformers and Diffusers
Fully multimodal agent: text, images, video, audio, docs...🌎
huggingface.co/docs/transform…
Create an agent using LLMs (OpenAssistant, StarCoder, OpenAI ...) and start talking to transformers and diffusers
It responds to complex queries and offers a chat mode. Create images using your words, have the agent read the summary of websites out loud, read through a PDF
How does it work in practice?
It's straightforward prompt-building:
• Tell the agent what it aims to do
• Give it tools
• Show examples
• Give it a task
The agent uses chain-of-thought reasoning to identify its task and outputs Python code using the tools.
It comes with built-in tools:
• Document QA
• Speech-to-text and Text-to-speech
• Text {classification, summarization, translation, download, QA}
• Image {generation, transforms, captioning, segmentation, upscaling, QA}
• Text to video
It is EXTENSIBLE by design.
Tools are elementary: a name, a description, a function.
Designing a tool and pushing it to the Hub can be done in a few lines of code.
The toolkit of the agent serves as a base: extend it with your tools, or with other community-contributed tools:
huggingface.co/docs/transform…
Please play with it, add your tools, and let's create *super-powerful agents* together.
Here's a notebook to get started: colab.research.google.com/drive/1c7MHD-T…
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.