- Logging
- Metrics
- Tracing
🧵👇
Logging may be obvious for many devs, but there's more to it than just doing it.
Choosing a format, which can be processed easily, should be a priority.
Then asking how those logs are collected and where they can be viewed is also pretty important.
Plain text may be easy to read, but can sometimes be pretty difficult to process automatically.
You should also consider so-called tags, which is like a map where certain variables can be set, like request ids, to be able to follow the execution of a call.
Logging in machine processable formats like JSON may be a good idea. This also opens up the possibility of processing logs automatically and indexing them inside Elastic, e.g.
Logs must be collected. Especially in distributed systems, where instances of services come and go, it's crucial to collect logs from each individual instance, maybe aggregate them, and then collect them at a central place.
After log collection, you still need to give people that need it access to all that information.
Maybe you have to unify different log formats into one, and then put them into an Elastic, e.g., or Splunk for indexing (whatever it is to use).
Metrics help a lot with system monitoring.
The amount of requests to a certain API, the amount of errors, the number of open database connections, and detailed information about available system resources are crucial informations at least the ops team needs.
You will have to build metric collection into your services.
There are a lot of libraries and frameworks for it.
But no matter the solution you choose, you still have to define them for each individual service, and add them to your code.
Like logs, metrics must be collected.
They are either written to files, stdout, or are available at a special HTTP endpoints (Prometheus e.g.).
No matter the solution chosen, you need to collect them.
All those collected metrics must be processed, maybe unified, and then put somewhere for people to view.
You'll maybe want to create dashboards, set up alerts on certain thresholds, etc.
Especially distributed systems with many micro services that all talk to other services are especially one thing:
Difficult to debug.
Tracing helps by making the way of requests through a system more transparent.
There are libraries and methodologies that help you to get into tracing.
You will, however, still have to instrument your code to start tracing with your services.
I think you now see where this goes. Traces also need to be collected.
The larger the system, the more different formats or methodologies there may be, for various reasons.
All of those need to be collected and put into a central place.
As with all the other things in this thread, you'll want your traces to be viewable by everyone that needs to.
Maybe you need to clean that data before you can show it to your users.