When you wield God-like abilities, people come to you with humble requests like this and within seconds you just blow their mind -- alternate introduction: "what can you do with bpftrace on Linux?" Let's go …
We're sunsetting an NFS server, replacing it with Ceph. Problem: There are hundreds of systems using the NFS server. Here is the request that has come-down from on-high:
Using a bit of kit called bpftrace on Linux (or dtrace on BSD), we can run this tool on the NFS server to tap into the kernel and provide statistics on NFS activity github.com/FrauBSD/nfsdtop
You shove this into a cron-job (adjusting the "-iSEC" and/or first "*" to something like "*/5" to determine how much sampling you want to do ... "*" and "-i60" is 100% sampling, "*/5" with "-i30" would be 10% sampling):
* * * * * root nfsdtop -JN1 -i60 >> /var/log/nfsd_stats.log
You then create a database, run telegraf configured to shove the information into said database, and now you've got time-series data representing all NFS activity centrally derived from the NFS server itself in real-time
You now have a very slick database with very slick contents from which you can create slick dashboards
So once you've got that sweet sweet data flowing, now we can answer the original ask by time-series data. As we try to sunset this server, we want to see the data come to a halt. Clearly some people need to be told to stop ... they did this morning, evidenced by the graphs
That's it. That's all there is to it. Now, I wouldn't necessarily call it over-engineering. You saw what the manager asked. It just happens that most people when faced with such an ask would probably berate the manager with "you don't computer, do you?"
Because let's be honest. "No" was a perfectly acceptable answer here. I just happened to have a wartime arsenal of tricks up my sleeves, so "Yes" was a possibility as well as getting it done in less than 1H. But guess what? I gave the tools away for free …
So while I have not YET released the dashboards (wait until this sunsetting operation is complete), you've got access to the tool that generates the data from the kernel plumbing. So have it. Become God-like to your peers and indispensable to managers …
Managers that, 12 years after you have left are still telling your replacements "well (insert name) could do it"
So what are you waiting for? If you run an NFS server, you should be using dtrace/bpftrace to plumb some JSON stats for database injection so you can look like a hero github.com/FrauBSD/nfsdtop
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Colleague: hey, your dashboard says this one user is pounding the NFS server we are trying to sunset, can you help us find the rogue Linux process that is writing to this NFS server?
Me: Why, sure!
(60 seconds later)
Colleague: Holy Shit!
<let us explore what just happened> 🧵
Me, armed with the user's name, logs into the NFS server on the command-line. Executes "nfsdtop -cwU <user>" and now I have a realtime display of the machines the user is using to access the NFS server. I then use "ssh" to log into the node where the activity is coming from
I then pull out this bit of kit to dump real-time NFS activity on the client-side, filtered on the user and it shows me everything I need for the next step. Process name, PID, user, VFS syscall, filesystem path, NFS mount, syscall result, etc.github.com/FrauBSD/viosno…
Red Team, Blue Team, why not both? Why not go purple and use Red Team's tactics against them? Let's go 🧵
Achtung: if you have problems with anxiety, turn back now
Huh, that's strange, you usually get mail on Tuesdays and you're almost certain you saw the mail truck stop by your place in the last hour, yet the mailbox is empty. There's as non-zero chance (however remote) you just got hit by a data mule
While Gmail is clamping down on MX relay by requiring SPF (at a minimum) and DKIM, now seems to be a good time to drop a bomb 🧵 💣 why there has been no better time to run your own mail server
E-Mail providers for decades have been built on the premise that account holders want to be able to send mail to people anywhere. As if e-mail at-large must be a proxy for physical mail, but I am about to turn that on its head for fun and profit
What if you had a publicly accessible mail server for sending/receiving e-mail but account holders can only e-mail other account holders? That one simple change has far reaching implications that take some getting used-to; and let me tell you, it is magnanimous!
ssh has secrets. Too many to share in one tweet. One of which is how it acts as a serial-line processor for secret keyboard functionality you probably never knew about. For example, why, when you press ENTER and then ~ immediately after, does the ~ not appear right away? Thread…
ssh secret No 1 is that you can tell how many ssh-layers deep you are (how many hosts you've hopped-through to get where you are) simply by pressing ENTER and then counting how many times you have to press ~ before it shows up on your prompt (important for tip No 2)
For example, if you ssh to host 1, then from host 1 you ssh to host 2, while on host 2 you can press ENTER and then ~ (nothing happens), ~ again (nothing happens), ~ again (a "~" appears), this tells you that you are 3-hosts deep (counting the one you originated from)
As happens quite often, I was pulled-in to help solve a mystery at work. In migrating to Enterprise Linux 9, some RPMs were failing to install due to build-id errors. A few hours later and I uncover a dirty change over 20 years in the making
Some individuals at RedHat who shall remain nameless decided to turn on signing by default for all packages, traditionally only opt-in for debuginfo packages
What’s the big deal? At a high level we are talking about every ELF object (executable, library, module) getting signed by rpmbuild. When you use “file” on a signed file, see the BuildId string to obtain 160-bit checksum value (which can be verified with “debugedit -i” on a copy)
@TonyBenoy They did bring me into the review process. And ignored 90% of what I said. Then had their manager approve it anyway. Then had my boss make me deploy it.
Today, I just got authorization to turn off their code en-masse and enable my new code which I wrote to replace their Python
@TonyBenoy "TLDR : A junior dev rewrote your code and the code had a bug?"
Sending 2M e-mails in the course of 4 weeks and causing HR to reach out to systems because several managers cannot receive e-mail.
Wasting ~600k USD in R&D
Wasting ~350k USD in electricity
Don't down-play the effects
@TonyBenoy "I don't understand why the hate for python."
They could have written in any one of the 28 languages I know (including Python) and it would have been just as terrible.