Kirill Bykov Profile picture
Explainable AI Machine Learning PhD student @UMI_Lab_AI, @bifoldberlin, @TUBerlin; Wir müssen wissen, wir werden wissen
Jun 16, 2022 12 tweets 8 min read
Have you ever wondered if your Network has learned malicious abstractions?

We are announcing DORA arxiv.org/abs/2206.04530 – the first automatic data-agnostic method to find outlier representations in #NeuralNetworks.

Here are watermark detectors in the pre-trained ResNet18! Image How DORA works?

DORA unveils the self-explaining capabilities of DNNs by extracting semantic information contained in the synthetic Activation Maximisation Signals (s-AMS) and employing this information further to identify outlier (and potentially infected) representations. Image