12,399 views

Efi Merdler-Kravitz

@TServerless

, 22 tweets, 5 min read

My Authors

@dyanacek

@dyanacek

Finished reading the excellent "Instrumenting distributed systems for operational visibility" by @dyanacek aws.amazon.com/builders-libra… adding my two cents + mini summary

1/ Using instrumentation for understanding how a system works is great idea, today's tools are even able to create a real map of the various resources that interact with each other.

2/ You can't use cat, grep, sed, and awk on your #Serverless application, you definitely need another set of tools

3/ Instrumentation frees you from the hassle of logging every statement you add, it will automatically record the most important data points. It will allow you to debug crashes or improve performance.

4/ In the heart of instrumentation lies the trace ID which is a unique identifier that is being passed between the various services. Although not mentioned, asynchronous and many to many execution adds an extra layer of complexity to the tracing. Batch write->Kinesis->Batch read

5/ Old fashion logs are still important, not everything can be instrumented, for example inner algorithm flow, but it's very important to correlate the logs with that special trace ID. When pulling the instrumentation details, pull also the relevant logs

6/ Logs are expensive and rather complex to handle. Unless you are a big company, let others handle the logs for you

7/ Creating alarms is important, however the hardest part is defining their threshold. Start with a number that makes sense and tune it as time goes by, you'll rarely choose the right one on your first choice.

8/ Log units of work, for example an http request or a single cron run. Aggregate multiple stages in a single unit of work to a single concrete log, however do log "progress" when the unit of work is long

9/ Record the input before doing anything manipulation on it

10/ Trim big requests, pull important details and drop the rest. For example requests arriving from #apigw to #Lambda proxy might contain uninteresting details, log only the body. As a rule of thumb in Lumigo we trim to 1K, although it's configurable

11/ Have an easy way to to change log level. In a Lambda using an environment variable to set the log level is very simple and easy

12/ Log latency of all requests, it will help you determine performance issues, in Lumigo we also log the request body and response

13/ Log queue depth when interacting with one, how many items are in the queue when pulling or pushing to it, it will help you pinpoint latency issues or scalability problems

14/ Group error metrics by type, don't use a single metric to capture all errors. Will allow you to handle the most prevalent ones first

15/ Protect your logs. Another reason to use external services that support advanced security features like encryption, MFA, access control, auditing etc.

16/ Avoid writing sensitive data in the logs and in general choose external log services that have good security/privacy certificates like GDPR, SOC2 etc.

17/ Prune your logs. Over time you'll find those log statements that you thought made sense but eventually are littering your log stream, remove them. Make yourself a habit, on a weekly/monthly basis, go over the logs and remove unneeded ones.

18/ If you're using #cloudwatch then don't forget to set retention periods, don't keep logs forever on CW. In case you need long term storage use other services like #S3 glacier

19/ Use cloudwatch #insights or #Athena or other SaaS offering to search your logs. At Lumigo we avoid solutions that are not #Serverless, so managing our own ELK is out of the question.

20/ Make sure that you have the same log and metrics infrastructure in your dev/qa/integration environments, you'll want to test that the metrics you emit are correct and the logs make sense

21/ Last but not least read aws.amazon.com/builders-libra…

Enjoying this thread?

Keep Current with Efi Merdler-Kravitz

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

Related hashtags

Related threads

Trending hashtags

Embed code for your website

Did Thread Reader help you today?