TuringPost Profile picture
Jun 26, 2021 5 tweets 3 min read Read on X
The Adversarial Robustness Toolbox (ART) = framework that uses generative adversarial neural networks (GANs) to protect deep learning models from security attacks

Thread⬇️
GANs = the most popular form of generative models.

GAN-based attacks:
+White Box Attacks: The adversary has access to the training environment, knowledge of the training algorithm
+Black Box Attacks: The adversary has no additional knowledge
2/⬇️
The goal of ART = to provide a framework to evaluate the robustness of a neural network.

The current version of ART focuses on four types of adversarial attacks:
+evasion
+inference
+extraction
+poisoning
3/⬇️
ART is a generic Python library. It provides native integration with several deep learning frameworks such as @TensorFlow, @PyTorch, #Keras, @ApacheMXNet

@IBM open-sourced ART at github.com/IBM/adversaria….
4/⬇️
If you'd like to find a concentrated coverage of ART, click the link below. You'll move to TheSequence Edge#7, our educational newsletter.
thesequence.substack.com/p/edge7
5/5

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with TuringPost

TuringPost Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TheTuringPost

May 29
Latent reasoning lets the model do more of its "thinking" internally.

This internal info has continuous format compared to the discrete output text.

To efficiently mix this info, researchers from @UofIllinois proposed HRPO (Hybrid Reasoning Policy Optimization) – an RL-based hybrid latent reasoning framework.

Here's how it works:Image
1. HRPO uses reinforcement learning (RL) to train LLMs to reason internally without needing CoT training data.

It integrates hidden states into token sampling using a learnable gating mechanism.
2. A gating mechanism "decides" how much to use internal hidden states vs. regular token info.

At first, the model sticks mostly to word-level input. Over time, it learns to include more of the hidden state features.
Read 6 tweets
May 26
A new recipe for training multimodal models

👉 Mixed together various data types: text next to images, video frames after captions, then webpages, etc. This way the model learns to connect what it reads with what it sees.

ByteDance proposed and implemented this idea in their BAGEL, a new open-source multimodal model.

Here's how it works:Image
Architecture:

BAGEL is one giant Transformer with two separate experts inside:

- Understanding expert handles text and ViT image tokens.
- Generation expert handles the VAE image-creation tokens.

These experts are placed side-by-side in every layer and "look" at the same sequence, but each focuses on its own job.
There are 2 image pipelines:

- Vision Transformer (ViT) for understanding pictures turns raw pixels into tokens the model can reason about.
- VAE + diffusion for generating pictures compresses an image to a small latent grid, then refines noise into a final image.
Read 7 tweets
May 24
.@sama's interview at @sequoia AI Ascent introduces a lot of insights on:

- How OpenAI came to ChatGPT
- Its aim to be the “core AI subscription”
- AI as an operating system
- What the ideal smart model is
- Main future goals

Here is an outline of his talk with the key ideas: Image
1. Past milestones and directions

- The first consumer product was Dolly API
- OpenAI also tried building a robot hand
- One person and then a team became excited about building LLMs with unsupervised learning, which started with GPT-1, GPT-2. Then GPT-3 showed something cool.
2. The hint to ChatGPT:

The transition from pure research to a sustainable business model needed massive funding to scale from GPT-3 to GPT-4.

This led OpenAI to release GPT-3 via an API, which had limited commercial success but revealed a key insight:

👉 People enjoyed chatting with the model, even when it wasn’t great at conversation.

This inspired the creation of ChatGPT
Read 14 tweets
May 20
What is the Agentic Web?

8 important updates from #MSBuild

1. Agents as first-class business & M365 entities.

2. Microsoft Entra Agent ID for knowing your agents.

3. NLWeb, MCP, Open Protocols as the foundation layer for an open agent ecosystem.

4. Agentic DevOps revolutionizes software development with GitHub Copilot’s new coding agent.

5. Azure AI Foundry with 1,900+ models & Copilot Studio

6. Collaboration: Human-Agent & Agent-Agent with Teams as a “multiplayer” agent hub.

7. Windows AI Foundry, Foundry Local (for macOS) and open-sourced WSL, NLWeb, and Copilot in VS Code

8. Microsoft Discovery — AI for science

Read more about there updates in our free weekly newsletter: turingpost.com/p/fod101Image
1. Agents as first-class business & M365 entities:

The new Microsoft 365 Copilot unifies chat, search, notebooks, and tools like “Researcher” and “Analyst.” With Copilot Tuning, businesses can tailor agents to their own knowledge, language, and brand voice.
2. Know your agents

Microsoft Entra Agent ID gives every AI agent a unique, verifiable identity — so you know what access and actions they’re allowed.
Read 9 tweets
May 20
The freshest research of the week:

Our top 9:
▪️ Beyond 'Aha!'
▪️ J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
▪️ The CoT Encyclopedia
▪️ System Prompt Optimization with Meta-Learning
▪️ Parallel Scaling Law for LMs
▪️ Insights into DeepSeek-V3
▪️ QuXAI: Explainers for Hybrid Quantum Machine Learning Models
▪️ AttentionInfluence
▪️ MLE-Dojo

▪️ Learning from Peers in Reasoning Models
▪️ WorldPM
▪️ Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
▪️ Learning Dynamics in Continual Pre-Training for LLMs
▪️ Memorization-Compression Cycles Improve Generalization
▪️ DanceGRPO
▪️ Unified Continuous Generative Model
▪️ Depth Anything with Any Prior
▪️ MetaUAS

🧵Image
Image
1. Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Proposes aligning models with meta-reasoning abilities (deduction, induction, abduction) to improve reasoning reliability and performance

arxiv.org/abs/2505.10554
Code: github.com/zhiyuanhubj/Me… Image
2. J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Trains evaluators to produce better judgment through structured reward for thoughtful assessment

arxiv.org/abs/2505.10320Image
Read 20 tweets
May 20
Designing models and hardware together — is it a new shift for the best
cost-efficient models?

This idea is used in DeepSeek-V3 that is trained on just 2,048 powerful NVIDIA H800 GPUs.

A new research from @deepseek_ai clarifies how DeepSeek-V3 works using its key innovations:

- Multi-head Latent Attention (MLA)
- Mixture of Experts (MoE)
- FP8 mixed-precision training
- Multi-Plane Network Topology

🧵Image
1. Multi-head Latent Attention (MLA)

MLA compresses the KV cache down to 70 KB per token, while other models like LLaMA-3.1 and Qwen2.5 need 7x more.

Thanks to this DeepSeek-V3:
- Handles long conversations
- Runs on limited hardware
- Makes inference cheaper and more scalable Image
2. Apart from MLA there are some other tricks to reduce the size of the KV cache:

- Shared KV (GQA/MQA): Multiple heads share a single set of KV pairs.
- Windowed KV: Keeps only recent info and drop the old stuff at the cost of long-range memory.
- Quantization: Store data in lower bit formats with minimal accuracy loss.
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(