Latest Twitter Threads by @kirill_bykov on Thread Reader App

Jun 16, 2022 • 12 tweets • 8 min read

Have you ever wondered if your Network has learned malicious abstractions?

We are announcing DORA arxiv.org/abs/2206.04530 – the first automatic data-agnostic method to find outlier representations in #NeuralNetworks.

Here are watermark detectors in the pre-trained ResNet18!

How DORA works?

DORA unveils the self-explaining capabilities of DNNs by extracting semantic information contained in the synthetic Activation Maximisation Signals (s-AMS) and employing this information further to identify outlier (and potentially infected) representations.

Share this page!

Enter URL or ID to Unroll