If there's anything you'd like to see, let me know!
Let's dive in:
High level overview of what's happening:
1️⃣ Load your audio file
2️⃣ Speech-to-text with Whisper
3️⃣ Split the transcript into chunks
4️⃣ Summarize the transcript
Let's dive into the code ⬇️
1. Open your audio file.
Whisper works on files with a maximum duration of approximately 20 minutes.
Longer file?
Split it up by using PyTube and handle each chunk seperately.
2. Call the Whisper API
Of course, you can use any speech-to-text API you prefer. I like Whisper for its accuracy and ease of use.
Any open-source alternatives with quality output?
3. Splitting the transcript into chunks
With the new OpenAI GPT 16k model, you can fit a large amount of context into one chunk. This is amazing for the model to 'understand' the full context and make connections.
Use some overlap in order for context to not be lost.
4. Prompting
Prompting is key. It determines your output more than anything else.
Be as concise as you can be. Instruct the model on how you want the output to look. You could include bullet points, main takeaways, follow-up actions and much more.
5. Initialize and run the summary chain
I'm using the refine summarization chain. This is great when you're working with large files.
It generates an initial summary based on the first chunk and updates it with the subsequent chunks.
6. Export the summary to a text file
Your meeting is summarized and you're ready to take action!
Once again, try to play around with the prompts you're using. This will greatly impact the resulting summarization.
7. Possible future implementations
You can go wild with this. You could use a Zapier integration to send the summarized meeting in an email, create appointments in your schedule and much more.
Meta AI has unveiled Voicebox, a groundbreaking generative model for voice synthesis tasks.
This model can generate speech from text and perform tasks like editing, noise removal, and style transfer.
Let's dive into the details! 🧵
Voicebox is a generative model that can synthesize speech in six languages.
It has been trained on a general task of mapping voice audio samples to their transcripts, enabling it to perform various text-guided speech generation tasks seamlessly.
🔬 The researchers at Meta developed a unique training method called "Flow Matching" for Voicebox.
This technique allows the model to learn from diverse speech data without the need for careful labeling.
Trained on 50,000 hours of speech and transcripts from audiobooks.
If there's anything you'd like to see, let me know!
Let's dive in:
A high-level overview:
1️⃣ Load the YouTube transcript
2️⃣ Split the transcript into chunks
3️⃣ Use a summarization chain to create a strategy based on the content of the video
4️⃣ Use a simple LLM Chain to create a detailed plan based on the strategy.
Language models have transformed natural language processing across industries, and now they're making waves in finance.
Enter FinGPT: An open-source Financial Large Language Model
Let's dive in 🧵
Extracting financial data can be daunting, spanning web platforms to PDFs.
While proprietary models like BloombergGPT have specialized data, the need for an open and inclusive alternative is clear.
Introducing FinGPT:
Developed by researchers from Columbia University and NYU Shanghai, FinGPT is an end-to-end open-source framework for economical large language models (FinLLMs).
Its mission: democratize financial data access and foster open finance. 📈
One step closer to human-level intelligence in AI:
A year ago, Meta's Chief AI Scientist, Yann LeCun, proposed a groundbreaking architecture that could revolutionize AI systems as we know them.
Today, the first implementation is here: I-JEPA.
A deeper dive 🧵
1/13 The goal?
To create machines that can learn internal models of how the world works, enabling them to learn faster, plan complex tasks, and adapt to new situations.
Let's dive into the details! 👇
2/13 📚 Introducing the Image Joint Embedding Predictive Architecture (I-JEPA).
The first AI model based on LeCun's vision. I-JEPA learns by creating an internal model of the world, comparing abstract representations of images instead of pixels themselves. 🖼️
@owasp has released a list of the top 10 most critical vulnerabilities found in artificial intelligence applications based on large language models (LLMs).
These vulnerabilities include prompt injections, data leakage, and unauthorized code execution.
This involves bypassing filters or manipulating the LLM using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions.
2. Data Leakage:
Data leakage occurs when an LLM accidentally reveals sensitive information through its responses. #cybersecurity
The power of natural language interaction is taking over!
Companies are bringing AI applications to life with large language models (LLMs). The adoption of language model APIs is creating a new tech stack in its wake.