Post

Dor

@dorvonlevi

May 14 • 1 tweets • 5 min read • Read on X

Building an AI-native @Coinbase means rebuilding everything, especially the hardest parts. We've put a lot of time into redefining compliance, where the stakes are incredibly high, and we have to be extremely thoughtful about implementation.

We have invested heavily in rebuilding our compliance ops around AI with that reality as our starting constraint, not an afterthought. Here is an overview of what we've learned and what we built.

Most people assume compliance work is mostly checking whether a name appears on a sanctions list. That is the easy 5%. The other 95% is interpretive judgment under uncertainty: a customer claims their wealth came from real estate. Do the property records actually support it? Does the timeline hold? Is the documentation legitimate, or does it feel too polished? You need compliance staff and investigators who understand what “suspicious” actually looks like in context.
That's part of why compliance is so hard to automate—and so expensive.

The first obvious AI approach is to hand the model the existing procedures and ask it to run them faster. That approach misunderstands what procedures are for. Good procedures are not bad investigations; they are deliberately incomplete investigations. Their job is to create consistency, auditability, and a minimum standard across thousands of cases. They excel at saying what must happen. They are far worse at capturing everything a strong analyst actually notices: which sources they trust, when they widen the search, when a document feels off, when an explanation technically fits but still does not feel earned.

Procedures also carry the shape of the old operating model: fragmented systems, time pressure, queue pressure, and the hard limit of how much one human analyst can read, cross-reference, and hold in working memory at once. That is not a flaw in the procedure. It is how you design a process for humans.

AI changes the constraint set. Reading, searching, comparing documents, and tracing inconsistencies no longer have to be treated as scarce analyst time. Done carefully, with proper controls and human review, models can explore more context, test more hypotheses, and surface more inconsistencies than any single analyst could reasonably do case by case.

So if you simply automate the procedure exactly as written, you may gain efficiency. You will not unlock the full value of AI. You will just make the old bottleneck run faster.

The better question is not “Can AI follow the analyst playbook?”

It is: once the cost of reading, cross-referencing, and testing hypotheses collapses, what should the investigation become?

A second tempting approach: feed it historical Suspicious Activity Reports (SARs) and let it learn from outcomes. This breaks down too. You rarely have the full state of what the analyst actually saw during the investigation. A case that looks straightforward today might only look that way because information surfaced later. A fraud indictment that didn't exist when the original analyst made the call, news articles that hadn't been published yet. Hindsight can contaminate your training data. Also, regulators themselves acknowledge that SAR decisions can be subjective.

The architecture has four layers. The first is data: continuously enhancing the coverage, quality, and architecture of the signals the system depends on. The second is classical machine learning models that cluster and classify alerts to determine what type of investigation needs to run. The third is the investigation agent itself: a multi-agent system that orchestrates specialized agents to execute the investigation end to end. The fourth is a safety filter that runs independently of typology, ensuring no risk vector is missed regardless of how the alert is classified. Each layer is independently auditable and learns from the feedback provided by human reviewers.

Inside the investigation agent, specialized sub-agents run across the full case surface: alert context, customer and identity signals, access patterns, risk indicators, transaction behavior, source-of-funds, onchain activity, and public adverse media. Each writes its findings into a shared case memory. A coordinator agent reconciles and challenges them. When sub-agents disagree, such as when source-of-funds marks activity as “explained” while adverse media surfaces a recent indictment, the coordinator attempts to resolve these disagreements knowing the common patterns. The narrative agent prepares the final report with all collected evidence and suggested resolution. The last self-validation agent acts as a guardrail: if the system cannot support its conclusion with sufficient confidence or data quality, the case is routed to manual investigation instead of being surfaced as an automated result.

Before any of this touched a real customer case, we built what we call a “Golden Set” - historical cases with known right answers. "Known right answers" in compliance is harder than it sounds. It meant re-investigating old cases, getting multiple senior analysts to independently agree on what the right call would have been, then debating the disagreements until consensus. Months of work before we could even start measuring.

Here's an important part (for now) - cases currently get BOTH the AI's full investigation AND a senior human review. We didn't reduce scrutiny, in fact, we added more of it until it no longer proves valuable. Cases resolve significantly faster AND get more eyes than they ever did before. Every human correction feeds back into the model as a training signal. It gets better because it's wrong in front of people who know how to fix it.

None of this would have shipped without clearing structural blockers most financial institutions are still stuck on. Security and privacy sign-off to send customer data to LLMs at all. Senior compliance officer alignment on AI-assisted human decision making. Model Governance team embedded since December - they observed the entire Golden-Set Evaluation process and are running a formal validation review with our Internal Audit team now.

Today this handles roughly 55% of our US fraud case volume with significantly less analyst time per case. Time freed goes to the harder cases AI can't yet handle - and to teaching it.

Our internal compliance and quality teams are the ones who are building this system with the engineers, training it, validating it, and continuing to shape how it improves. In the process, they've developed skills that are incredibly valuable: how to design evals, how to think about model bias, how to think about human bias, how to architect human-in-the-loop systems, skills that are becoming among the most valuable at any company.

This entire project started ~6 months ago with a whiteboarding session between @galpa42 and I, and was built by an AI-pilled cross-functional and it’s just the first pod - there's a multi-month roadmap,rebuilding compliance from the ground up with AI. Huge thanks to everyone involved and congratulations to @galpa42 for shipping two babies to production this month :)

The future of high-stakes work is not AI replacing judgment. It is AI making judgment scalable, auditable, and continuously improvable.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Dor

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!