Read on Twitter

12,399 views

nick frosst

@nickfrosst

, 9 tweets, 4 min read Read on Twitter

@NicolasPapernot

@NicolasPapernot

My new paper with @NicolasPapernot and @GeoffreyHinton is out on arXiv today. It’s about the similarity structure of representations space, outlier data (e.g. adversarial attacks) and generative models. Don’t have time to read the paper? Read this instead! arxiv.org/abs/1902.01889

Our paper focused on a loss we call Soft Nearest Neighbor Loss (SNNL). It measures the entanglement of labeled data points. Data with high SNNL has muddled up classes, while the classes of a data set with low SNNL are easy to separate.

We can measure the SNNL of the data in the hidden layers of a resnet during training and show that each layer separates the data slightly more than the previous layer. the last layer learns a representation of the data which separates the classes, so it has the lowest SNNL value

But entanglement can be desirable! You want the output of a GAN to be entangled with real data. If we measure the SNNL between real and generated data, we can see that SNNL increases over training. It serves as a good tool for understanding GAN training.

What happens if we learn a classifier by maximizing the SNNL of each hidden layer in addition to minimizing cross-entropy? We call these *Entangled Models* because their internal class representations are entangled. Surprisingly, this marginally increases performance!

Entangled models are better at detecting adversarial attacks using the DkNN. We estimate the uncertainty of each classification and find that entangled models project outlier data away from the expected manifold, making adversarial attacks easier to detect.

Entangled models are less vulnerable to black box attacks based on transferability. If we visualize the adversarial gradients of a targeted FGSM attack for normal models, we see shared class clusters. This enables transferability. These clusters don't exist with entangled models!

Entangled models arent trained with a specific attack in mind, so they should be good at distinguishing all outlier data from real data. If we train a model on MNIST and test it on notMIST, we see that entangled models project the outlier data far away from the real test data.

Read the paper for a more thorough investigation of this exciting loss and the effects of entangling classes in classifications networks and adversarial examples as well as an investigation of SNNL loss in GAN settings :) thanks for reading :)
arxiv.org/abs/1902.01889

Like this thread? Get email updates or save it to PDF!

Subscribe to nick frosst

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to nick frosst

This content may be removed anytime!

Try unrolling a thread yourself!

More from @nickfrosst see all

Related threads

Trending hashtags

Did Thread Reader help you today?