We just had a serious outage of @AdGuard DNS, but it was actually caused by @Facebook. What happened and how on earth @AdGuard may depend on FB? Let me try to explain. (1/9)
Everything started with Facebook name servers going down today. AdGuard DNS connects to them in order to find out the addresses of Facebook domains. So, they went down and now AdGuard DNS was responding with error to every request for FB domains’ addresses. (2/9)
This caused a considerable spike in the overall number of requests. What happened? Every app, every device was now repeatedly requesting FB domains as if they can’t live without it. (3/9)
The high number of requests is not much of a problem for us, we’re ready for higher load so this went almost unnoticed. So everything was working well until one crucial moment when Facebook engineers decided to null-route their nameservers. (4/9)
What does this mean? From now on requests to FB name servers not just fail, they TIME OUT. Now we could not respond quickly with an error and have to wait for a few seconds until we’re sure there will be no response. (5/9)
The worst part is that we weren’t doing any negative caching. It means that if we cannot resolve a domain, we were trying to do that again and again until it finally succeeds (it never did) instead of caching the negative result at least for a few seconds. (6/9)
So we had an overwhelming number of incoming queries that time out and simply exhaust the servers resources. This all lead to one of the worst outages we ever had with AG DNS. (7/9)
At some point we almost hit 1M queries per second (our normal load is about 250-300k). The most of the queries are encrypted (DoT/DoH/DoQ) so this is like 10x regular DNS load. (8/9)
It took us about an hour to figure all that out, implement a fix (negative caching it is) and deploy it to every AG DNS server. Everything works well now, but we learned a couple of very useful lessons. Thanks, FB! (9/9)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Firefox will soon get the Manifest V3 support and I've got a couple of things to say about that. First of all, Manifest V3 is not an inherently bad idea like some news outlet assumed. The spec can be improved, that's why the W3C group exists.
The problem is that Google refuses to adopt the most important bits of the developers' feedback. But I am very glad to see that @mozilla on the contrary does listen to us and their version will be better in every way.
(2/8)
One of the benefits of their way is that in @firefox content blockers will continue to exist in their original, more "powerful" form. Does it mean that a considerable share of users will replace Chrome with FF now? I doubt it, but there's another thing.
Here's another story about @AdGuard. Our iOS app is open source, the code is published on Github. Sometimes, we spot clones that just copy the code and pop up on @AppStore under a different name. But today I've encountered probably the most blatant rip-off in my life.
🧵 (1/7)
Yesterday our server started receiving strange "check in-app subscription" requests. You see, the requests looked like they were sent from AdGuard apps, but there was actually no subscription with that token, and the requests looked kinda off a bit.
(2/7)
The only clue was "vn.visafe0" string in the check request. It appears that ViSafe is a Vietnamese app that is "Visafe is researched and developed by the National Cyber Security Monitoring Center (NCSC)". Hmm, interesting...
First of all, the CWS team was trying to solve a really serious issue - tons of malware and adware extensions that were overwhelming the CWS. The obvious solution was to introduce a manual moderation process and that's exactly what they did.
(2/9)
This has lead to increased review times so now we can wait for up to 3 weeks until the extension update is reviewed and allowed to CWS. There's no way to ask for an expedited review so if you shipped a bug, you are f*cked.