, 9 tweets, 4 min read Read on Twitter
My new paper with @NicolasPapernot and @GeoffreyHinton is out on arXiv today. It’s about the similarity structure of representations space, outlier data (e.g. adversarial attacks) and generative models. Don’t have time to read the paper? Read this instead! arxiv.org/abs/1902.01889
Our paper focused on a loss we call Soft Nearest Neighbor Loss (SNNL). It measures the entanglement of labeled data points. Data with high SNNL has muddled up classes, while the classes of a data set with low SNNL are easy to separate.
We can measure the SNNL of the data in the hidden layers of a resnet during training and show that each layer separates the data slightly more than the previous layer. the last layer learns a representation of the data which separates the classes, so it has the lowest SNNL value
But entanglement can be desirable! You want the output of a GAN to be entangled with real data. If we measure the SNNL between real and generated data, we can see that SNNL increases over training. It serves as a good tool for understanding GAN training.
What happens if we learn a classifier by maximizing the SNNL of each hidden layer in addition to minimizing cross-entropy? We call these *Entangled Models* because their internal class representations are entangled. Surprisingly, this marginally increases performance!
Entangled models are better at detecting adversarial attacks using the DkNN. We estimate the uncertainty of each classification and find that entangled models project outlier data away from the expected manifold, making adversarial attacks easier to detect.
Entangled models are less vulnerable to black box attacks based on transferability. If we visualize the adversarial gradients of a targeted FGSM attack for normal models, we see shared class clusters. This enables transferability. These clusters don't exist with entangled models!
Entangled models arent trained with a specific attack in mind, so they should be good at distinguishing all outlier data from real data. If we train a model on MNIST and test it on notMIST, we see that entangled models project the outlier data far away from the real test data.
Read the paper for a more thorough investigation of this exciting loss and the effects of entangling classes in classifications networks and adversarial examples as well as an investigation of SNNL loss in GAN settings :) thanks for reading :)
arxiv.org/abs/1902.01889
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to nick frosst
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!