Newsletter exploring AI & ML
- AI 101
- ML techniques
- AI Business insights
- Global dynamics
- ML History
Led by @kseniase_
Save hours of research 👇🏼
1 subscribed
Sep 22 • 6 tweets • 3 min read
.@NVIDIA's new NVLM multimodal models use:
- A powerful processor kept unchanged during training
- Image division and pixel shuffle for effective processing
NVLM architectures:
• Decoder-only
• Cross-attention based
• Hybrid
Let's explore their differences and find the best👇1. NVLM-D (Decoder-only):
Uses a vision encoder to convert images into text and an LM to handle text. They are connected by MLP module that aligns the image and text information.
It divides big images in small pieces and uses tile tags to keep track of the image's structure.
Sep 8 • 6 tweets • 2 min read
That's really interesting! Mini-Omni LLM demonstrates parallel processing in real-time conversations. It can hear, process and talk at the same time.
How does Mini-Omni achieve this?
🧵 1. Audio generation with text instruction:
The model converts text instructions into real-time speech. It generates a text response and quickly transforms it into spoken words, using text-to-speech technology.
This allows faster and smoother conversations.
Jul 10 • 7 tweets • 3 min read
Let's dive into one of the newest concept of synthetic data generation -
active inheritance.
Proposed by @CohereForAI, it's a strategy used in ML to intentionally design synthetic data to achieve specific goals.
Here's how active inheritance works: 1. What's in the base?
The base is a knowledge distillation technique, where a smaller LLM (student) learns from a larger, more powerful model (teacher).
The student tries to mimic the teacher outputs for the same input prompts by learning from the data the teacher generates.
Jun 3 • 11 tweets • 3 min read
A new model, Meteor, leverages multifaceted information and a Mamba architecture to enhance comprehension and response capabilities in vision-language tasks.
Let's explore its architecture and training strategy👇 1. Meteor's architecture includes:
- a vision encoder (CLIP-L/14)
- vision and tor projectors (MLP modules with GELU activation)
- the Mamba-130M architecture for computational efficiency
- the InternLM2-7B as the backbone LLM.
Apr 11 • 10 tweets • 2 min read
TimeGPT is the first foundation model specifically designed for time series analysis.
It excels at generating precise forecasts across a diverse range of datasets and domains.
Here's what you need to know about it:
1/8
The model leverages a Transformer-based architecture, optimized for time series data, with self-attention mechanisms that facilitate the handling of temporal dependencies and patterns across varied frequencies and characteristics.
2/8
Mar 3 • 9 tweets • 3 min read
8 Free Courses to Master Large Language Models:
1. @cohere LLM University 2. @huggingface NLP course 3. @databricks courses
and more!
🧵
1. @cohere LLM University
The course offers insights into how LLMs work, and their practical applications, and guides participants on using LLMs to build and deploy applications.docs.cohere.com/docs/llmu
Feb 19 • 7 tweets • 2 min read
DoRA (Weight-Decomposed Low-Rank Adaptation) sets a new standard for optimizing AI models.
It combines the benefits of full model fine-tuning and LoRA.
How does it do that? Let's see 👇🏼
1/7
The genius of DoRA lies in its unique handling of pre-trained weights.
It separates these weights into two parts:
1. one that determines the size (magnitude) 2. one that determines the orientation (direction) of the weight vectors
2/7
Dec 27, 2023 • 18 tweets • 7 min read
Want to understand foundation models, generative AI models, and transformers?
Here is your FREE list of 15+ resources to do that:
1. Efficient Transformers: A Survey explores the evolution of Transformer models in various domains. It provides a comprehensive overview of different Transformer variants (X-formers) to guide researchers. arxiv.org/abs/2009.06732
1. Zero-shot, few-shot, and chain-of-thought lineage prompting techniques explained.
Check our article detailing various prompt engineering techniques at: turingpost.com/p/cot
Dec 8, 2023 • 9 tweets • 4 min read
8 free courses to master large language models:
- Cohere LLM University
- Hugging Face NLP course
- DeepLearning AI courses
- Weights & Biases course
- Introduction to LLMs course by Google Cloud
...
🧵
1. @cohere LLM University
The course offers insights into how LLMs work, and their practical applications, and guides participants on using LLMs to build and deploy applications.
Vector embeddings turn complex data into numerical forms, a crucial step for Foundation Models & LLMs.
Let's dive into how they redefine AI’s capabilities: 1. Semantic Information Capture:
Vector embeddings are adept at encoding both semantic & syntactic information. This allows models to grasp context and meaning, a fundamental aspect for understanding natural language.
Dec 2, 2023 • 11 tweets • 6 min read
10 surveys about transfer learning and domain adaptation you need to read.
Domain: Computer vision
🧵
1. A Survey on Transfer Learning (2010)
The survey categorizes and reviews transfer learning's progress for classification, regression, and clustering, discussing its relationship with domain adaptation, multitask learning, and co-variate shift.
Let's dive into each of them: 1. Prompt engineering
Designing specific input prompts that guide the model to apply its general knowledge in a way that's relevant to the task.
Nov 22, 2023 • 24 tweets • 6 min read
Quantization is a technique used to reduce the size and increase the efficiency of deep learning models.
Here is a list of 23 LLM quantization techniques you need to know about: 1. LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models arxiv.org/abs/2206.09557
Nov 21, 2023 • 7 tweets • 2 min read
Understanding how large models can be optimized into smaller yet efficient versions is key to AI advancements.
Let’s delve into the critical characteristics of a model.
These can be tweaked to maintain performance while reducing size and resource demand: 1. Number of Parameters: This is the total count of learnable weights in a model. Generally, more parameters mean greater expressiveness, but they also demand more computational resources and memory during both training and inference phases.
Sep 4, 2023 • 13 tweets • 7 min read
Hugging Face's Chief of Science @Thom_Wolf shared the resources he used to join the fields of NLP, AI, and ML!
Here is the list with the links he shared. 🧵 1. "Deep Learning" by @goodfellow_ian, Yoshua Bengio and Aaron Courville
Provides a great overview of current deep learning techniques.
It includes
▪️ ChatMessages consist of text + user. Users can be: System, Human, and AI
▪️ Examples: input/output pairs
▪️ Document: a piece of unstructured data
1/8