TuringPost Profile picture
Jun 26, 2021 5 tweets 3 min read Read on X
The Adversarial Robustness Toolbox (ART) = framework that uses generative adversarial neural networks (GANs) to protect deep learning models from security attacks

Thread⬇️
GANs = the most popular form of generative models.

GAN-based attacks:
+White Box Attacks: The adversary has access to the training environment, knowledge of the training algorithm
+Black Box Attacks: The adversary has no additional knowledge
2/⬇️
The goal of ART = to provide a framework to evaluate the robustness of a neural network.

The current version of ART focuses on four types of adversarial attacks:
+evasion
+inference
+extraction
+poisoning
3/⬇️
ART is a generic Python library. It provides native integration with several deep learning frameworks such as @TensorFlow, @PyTorch, #Keras, @ApacheMXNet

@IBM open-sourced ART at github.com/IBM/adversaria….
4/⬇️
If you'd like to find a concentrated coverage of ART, click the link below. You'll move to TheSequence Edge#7, our educational newsletter.
thesequence.substack.com/p/edge7
5/5

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with TuringPost

TuringPost Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TheTuringPost

Feb 18
3 models to pay attention to:

▪️ LM2: Large Memory Models

- Uses a Transformer architecture with a memory module to improve long-context reasoning.
- Outperforms RMT by 37.1% and excels in multi-hop inference.

▪️ NatureLM:

- Is trained across scientific domains.
- Enhancing tasks like SMILES-to-IUPAC translation and CRISPR RNA design for cross-domain applications.

▪️ Goedel-Prover:

- Advances formal proof generation
- Achieves 57.6% Pass@32 on miniF2F using expert iteration and statement formalizers.

Find the links below👇Image
Image
Image
1. LM2: Large Memory Models by Convergence Labs Ltd.

huggingface.co/papers/2502.06…
2. NatureLM: Deciphering the Language of Nature for Scientific Discovery from @MSFTResearch

huggingface.co/papers/2502.07…
Read 5 tweets
Feb 18
The freshest AI/ML research of the week:

Our top 7
▪️ Matryoshka Quantization
▪️ LLM Pretraining with Continuous Concepts
▪️ LLMs can easily learn to reason from demonstrations
▪️ Forget what you know about LLMs evaluations – LLMs are like a chameleon
▪️ Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
▪️ Hephaestus
▪️ SynthDetoxM Dataset

▪️ The Curse of Depth in LLMs
▪️ InfiniteHiP
▪️ Distillation Scaling Laws
▪️ TransMLA: Multi-Head Latent Attention
▪️ Logical reasoning in LLMs: A survey
▪️ ReasonFlux
▪️ How Stanford’s s1 surpasses DeepSeek-R1
▪️ The Stochastic Parrot on LLM’s Shoulder
▪️ Training LMs for Social Deduction with Multi-Agent RL
▪️ Towards Internet-scale training for agents
▪️ WorldGUI
▪️ CoSER: Coordinating LLM-Based Persona Simulation
▪️ Scaling Pre-training to One Hundred Billion Data for VLMs
▪️ Adapting Language-Specific LLMs to Reasoning Models

🧵Image
Image
Image
1. Matryoshka Quantization from @GoogleDeepMind

Introduces MatQuant, a multi-scale quantization method that mixes int2, int4, and int8 layers for efficient model deployment

huggingface.co/papers/2502.06… x.com/12714828789589…
2. LLM Pretraining with Continuous Concepts from @AIatMeta

Presents CoCoMix, which mixes token embeddings with abstract concept representations to improve training efficiency.

huggingface.co/papers/2502.08…
Code: github.com/facebookresear… Image
Read 24 tweets
Feb 16
Free useful guides on model distillations:

1. Model Distillation guide from @OpenAI
2. Knowledge Distillation tutorial by @PyTorch
3. Jetson Introduction to Knowledge Distillation by @nvidia
4. Tutorial on Knowledge Distillation with @kerasteam
5. @huggingface's guides:
- Knowledge Distillation
- Knowledge Distillation for Computer Vision

Save the link and check out the links below 👇Image
1. Model Distillation guide from @OpenAI

Explains this process step-by step, including
- storing outputs from a large model
- evaluating both large and small models
- create training data for a small model
- assess the fine-tuned small model

platform.openai.com/docs/guides/di…
2. Knowledge Distillation tutorial by @PyTorch covers:

• Extracting hidden representations for further calculations
• Modifying PyTorch training loops to include additional losses
• Enhancing lightweight models using complex models as teachers

pytorch.org/tutorials/begi…
Read 8 tweets
Feb 15
Distillation involves using a large teacher model to train a smaller student one.

But can we predict a distilled model’s performance based on teacher quality, student size, data volume, etc.?

@Apple and @UniofOxford explored this and developed distillation scaling laws.

Here are the key takeaways👇Image
1. A good teacher doesn’t always mean a better student:

If a teacher is too strong, the student might struggle to learn from it, leading to worse performance.
This is called the capacity gap — when the student isn’t powerful enough to properly mimic the teacher.
2. Distillation scaling law predicts how well a student model will perform based on three key factors:

- Student model's size
- The number of training tokens
- The teacher’s size and quality

This law follows a "power law" relationship, which means that performance improves in a predictable way but only to a point. Then adding more resources doesn’t help.
Read 10 tweets
Feb 10
The freshest AI/ML research of the week:

Our top 4
▪️ AlphaGeometry2
▪️ ZebraLogic
▪️ Limo: Less is More for Reasoning
▪️ Great Models Think Alike and this Undermines AI Oversight

▪️ Activation-Informed Merging of LLMs
▪️ Content-Format Integrated Prompt Optimization (CFPO)
▪️ BOLT: Bootstrapping Long Chain-of-Thought
▪️ Token Assorted: Mixing Latent & Text Tokens
▪️ ScoreFlow
▪️ The Jumping Reasoning Curve?
▪️ Demystifying Long Chain-of-Thought Reasoning in LLMs
▪️ MAGA
▪️ ParetoQ: Scaling Laws in Extremely Low-Bit LLM Quantization
▪️ Analyze Feature Flow to Enhance Interpretation and Steering in LMs
▪️ PILAF
▪️ DuoGuard
▪️ Limitations of LLMs in Clinical Problem-Solving
▪️ AI and Legal Analysis
▪️ HackerRank-ASTRA
▪️ The Open-Source Advantage in LLMs
▪️ UltraIF: Advancing Instruction-Following

🧵Image
Image
Image
1. AlphaGeometry2 (Olympiad Geometry Solver) from @GoogleDeepMind

Enhances AlphaGeometry to solve IMO-level geometry problems with a broader formal language

huggingface.co/papers/2502.03… x.com/12714828789589…
2. ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Evaluates LLMs on logic grid puzzles, revealing how complexity diminishes accuracy despite enhanced inference strategies

huggingface.co/papers/2502.01…
Benchmark: huggingface.co/spaces/WildEva… Image
Read 23 tweets
Feb 10
Sliding Tile Attention (STA) speeds up video generation up to 3.53x times.

It focuses only on small, relevant regions at a time and moves across the video in a sliding pattern.

STA processes larger chunks (tiles) at once, making it faster and more hardware-efficient.

Here's how it works:Image
Image
Firstly, what's wrong with current methods?

3D attention, that is generally used in Diffusion Transformers (DiTs), processes all video frames at once, treating every pixel separately, which takes up a huge amount of computing power—about 70% of the total effort.

The problem with traditional Sliding Window Attention (SWA) is that it creates "mixed blocks," which are inefficient for GPUs.

That's why researchers proposed Sliding Tile Attention (STA) method.
STA method:

STA organizes the video into structured "tiles" (small 3D blocks), ensuring that each tile interacts with only a few nearby tiles.

It works like a smart sliding window that moves across the video, focusing on the most relevant areas.

This results in a GPU-friendly design that works efficiently with existing tools like FlashAttention.Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(