Building reliable software is complex.

Here are 16 lessons I learned the hard way in the last five years as a DevOps engineer 🧵👇

#DevOps #DevOpsCommunity
1. No in-code fallbacks for configs:

If the service can't load the config on startup for any reason, it should just crash. It's easier to diagnose than the result of one borked instance going down an ancient code path due to misconfiguration.
2. Stringent RPC settings:

Zero retries and a timeout like 3X the p99. We are striving for predictability here; sprinkling retries and long timeouts as quick fixes lead to a week-long investigation and headache a year from now.
3. Never give up on local testing:

It keeps dev cycle time much shorter than needing to rely on CI or remote workspaces. Containerise local test environment. It makes it simpler to simulate the production environment in local machines.
4. Avoid state like the plague:

Managing a stateful service is 10X trickier than a stateless one. There are plenty of good-managed DBs and caches; use one of those.
5. Use GIT:

Use it for everything - infrastructure, configuration, code, dashboards, and on-call rotations. Your git repository is your point-in-time recoverable source of truth.
6. Don't waste time on code coverage:

They make excellent charts that have a minimal relationship with how much actual value your change is providing.
7. Prioritize real-world validation:

It's better to push your code to staging and show it does what you wished and doesn't break anything. The next best thing to do is integration tests. The unit test comes at last place.
8. For infrastructure changes, make plans extremely obvious:

Post the terraform plan as a comment on a pull request. There are great tools to make sure the change you think you're making is the change you're making.
9. Use Docker:

It's industry standard for a reason - wrangling dependencies in environments with tools like chef or ansible loses to these nice self-contained artefacts any day.
10. Deploying everything all the time:

Every day that goes by without you deploying increases the chances that it's secretly been broken, and it's tough to track down what went wrong two weeks after the fact.
11. Validate deployments as they go:

Use canary deployments or good readiness probes for validation. Sometimes bad images get built and deployed to prod without anyone noticing till it is too late.
12. Use Kubernetes:

Assuming you have more than one service and more than one instance, you either need service discovery, autoscaling, or deployment versioning. K8S gives infra teams scalability superpowers.
13. Use tools to manage K8S manifest like helm:

You must never use kubectl apply, edit, or delete directly. The resource lifecycle needs to be findable in version control.
14. Avoid operators and CRDs:

K8S has a steep learning curve, and custom operators lead you to the WTF that is going on in the territory. Please keep it simple.
15. Run 3 of everything:

With backups, service and DB instances, two is one, and one is none. Make sure 2 of those three instances can handle the load independently, or else you don't have the fault tolerance you think you do.
16. Structured logs are non-negotiable:

This, plus injected trace ids, gets you 90% of APM functionalities, but much cheaper and less work needed from developers.
That's a wrap!

If you wish to learn more:

1. Follow me @pragyanatvade for more of these
2. RT the tweet below to share this thread with your audience

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pragyan Tripathi

Pragyan Tripathi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @pragyanatvade

Oct 24
Networking protocols are the most important concepts to excel as a DevOps engineer.

Unfortunately, most of them don't seem to understand the basics.

Here's a crash course to make you an expert in OSI Model 🧵👇 Image
OSI stands for Open Systems Interconnection. It provides a standard for different computer systems to be able to talk with each other.

7 layers of OSI model are defined as follows : Image
L7. Application Layer:

It interacts directly with the user. Browsers rely on it to initiate communications. Responsible for protocols and data manipulation that the software relies on to present meaningful data to the user. Like HTTP and SMTP Image
Read 11 tweets
Sep 14
17 roles you must know to run a successful software development team🧵👇

For building a successful software product, you need to have a strong software development team. These are the following roles crucial to run the software business👇

#software #programming #developer
1. Project Executive

Experienced professionals with a good grasp of business serve as project executives. An individual with this role must be granted the required authority to take any rational business decisions for the project.
2. Product Owner

The product owner is responsible to make sure all the requirements of the project follow the business needs. An individual having such a role must be familiar with the strategic aim and objectives of the project.
Read 19 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(