Read on Twitter

12,399 views

@NicolasPapernot

, 10 tweets, 4 min read Read on Twitter

@nickfrosst

@nickfrosst

CleverHans blog post with @nickfrosst: we explain how the Deep k-Nearest Neighbors (DkNN) and soft nearest-neighbor loss (SNNL) help recognize data that is not from the training distribution. The post includes an interactive figure (credit goes to Nick): cleverhans.io/security/2019/…

Models are deployed with little input validation, which boils down to expecting the classifier to correctly classify any input. This goes against one of the fundamental assumptions of ML: models should be presented at test time with inputs that fall on their training manifold.

If we deploy a model on inputs that may fall outside of this data manifold, we need mechanisms for figuring out whether a specific input/output pair is acceptable for a given ML model. In security, we sometimes refer to this as admission control (see arxiv.org/abs/1811.01134)

The DkN breaks the black-box myth around deep learning. Patterns extracted by hidden layers on a test input are compared to those found during training to ensure that when a label is predicted, patterns that led to this prediction can be found in the training data for this label.

This allows us to measure uncertainty in a way different from how neural nets typically compute class scores (we argue that the softmax is not ideal at test time). When patterns found in the training data agree with test-time patterns, the prediction has high credibility.

Adversarial examples typically gradually turn a small change in the input domain into a large change in the model’s output space. This results in layers closer to the input of the model having representations that are closer to the correct class of the input while

layers towards the output have representations that are closer to the wrong class. Credibility helps distinguish legitimate data from outlier data by forcing inputs to have more consistency across the network’s architecture.

We note that evaluations that do not take into account the credibility metric (as done in Sitawarin and Wagner arxiv.org/abs/1903.08333 to be presented at #sp19 's DLS workshop) are not sufficient to draw conclusions on the robustness of the DkNN.

Finally, with the SNNL, we modify the training objective of our neural net to improve the similarity structure of its hidden representations (which are then analyzed with the DkNN). This led us to a surprising observation:

encouraging hidden layers to entangle data (to bring points from different classes closer together) improved the similarity search performed by the DkNN more than encouraging representations to disentangle data, which would help achieve a large (SVM-like) margin between classes

Like this thread? Get email updates or save it to PDF!

Subscribe to Nicolas Papernot

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Nicolas Papernot

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

Related threads

Trending hashtags

Did Thread Reader help you today?