One example: keeping spam email content, to prevent it from being used in spam campaigns again.
- mostly-transient processing and data minimization
- semantic annotations and granular access controls
- siloed data-processing exceptions.
Fighting abuse often requires purpose-limited processing, and keeping everything inside a silo gives you much more control over APIs.
- golden datasets
- abuse verdicts
- trust scores
- appeals data
First, you add semantic annotations to your data, to know what each field contains. Things like "this int64 is a pseudonymous ID".
(Woa, this talk is making many more things public than I expected!)
This allows you to check e.g. that analysts only use the data that they need when running queries.
The intake process can be event-driven, like with automated scanning or user reports.
When designing exceptional processing, narrow the output as much as you can.