Troubleshooting is the art of taking a problem, gathering information about it, analyzing it, and finally solving it.
While some problems are inherently βharderβ than others, the same basic approach can be taken for every problem.
Not just fixing!
While fixing a problem is one of the major parts of troubleshooting, there are other parts that cannot be neglected: documenting the problem (and fix), and performing a root cause analysis (RCA).
Documenting the problem (and the fix) can help in the future when another (or possibly the same) administrator is faced with the same, or a similar, problem.
Performing a root cause analysis can help in preventing similar problems in the future.
Using the scientific method -
A good schema to follow wen troubleshooting is d scientific method:
1.Clearly define d issue-
Take a step & view d larger picture, den clearly define d actual problem. Most of d problems reported r symptoms of another problem, not d actual problem.
For eg, a user might call about a problem signing into a machine. While this is a problem for d user, the actual problem can be a forgotten passwd, an incorrectly configured machine, a nw issue, or something else entirely.
Further investigation is needed to determine d cause.
2. Collect information -
The next step is collecting as much (relevant) as possible. This information can come from a wide variety of sources: reading log files, information displayed on screen or in a GUI, follow-up questions for the original reporter, etc.
3. Form a hypothesis -
After looking at all gathered info, & d symptoms observed/reported, it is time to form a hypothesis abt the cause of d problem.
Sometimes this can be easy; for example, when a user has forgotten his password. Other times, it can be harder; for example, when a single service in a high-availability cluster fails to start on Mondays during months with an "e" in their name.
The key to remember during this step is that the hypothesis is just that, a hypothesis: a best guess as to what can be the cause of the issue. During the following steps, this hypothesis will be tested. If it turns out the hypothesis was wrong, a new one can be formed.
4. Test the hypothesis -
With an initial hypothesis formed, it can be tested for validity. How this testing happens depends on d problem & d hypothesis.
For example, when d hypothesis for a login problem states, βThe nw connection between d workstation & d KDC is being interrupted by a firewall,β the testing will be different from a hypothesis for a spontaneously rebooting server including a faulty UPS.
5. Fixing the problem -
If a hypothesis was not found to be invalid, an attempt can be made to fix the problem. During this stage, it is vital to only change one variable at a time, documenting all changes made, and testing every change individually.
Keeping backups of any changed configuration files, and reverting to those backups if a change was found to be ineffective, is also crucial. Modifying multiple configurations at once typically only leads to further issues, not fixes.
6. Rinse & repeat -
If the proposed fixes did not actually resolve the issue, the process will need to be restarted from the top. This time, any new information discovered during this cycle can be added to the mix to form a new hypothesis.
Hope you like this thread. If yes Retweet it!
More to come.
Follow me for more such content.
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
Zombie processes in Linux are sometimes also referred to as defunct or dead processes. Theyβre processes that have completed their execution, but their entries are not removed from the process table.
What are different Process States?
Linux maintains a process table of all the processes running, along with their states. Letβs briefly overview the various process states:
What is systemd and why should Linux users care about it?
Everything about "systemd" !!
A Mega Thread π
What is systemd ?
systemd is the glue that holds Linux systems together. systemd is a collection of building blocks, which handle services, processes, logging, network connectivity and even authentication.
systemd handles the boot process for Linux systems. As an init implementation, it has a PID of 1 like other init systems, such as System V, Upstart.
It was designed as a replacement for SystemV and LSB-style startup scrips, which were prevalent since 1980s.
Every Linux Admin or DevOps Engineer should know what happens when a Linux system boots. It's a very popular Interview Question as well.
Every time you power on your Linux PC, it goes through a series of stages before finally displaying a login screen that prompts for your username or password.
There are 3 high level stages of a typical Linux boot process.
Everything you need to know about Virtualization, VMs , Containers, Pods, Clusters ..
A Mega Thread π
What is Virtualization?
Virtualization is the act of dividing shared computational resources: CPU, RAM, Disk, and Networking into isolated resources that are unaware of the original shared scope.
What is a virtual machine?
A VM is a virtual env that functions as a virtual computer system with its own CPU, memory, nw interface, & storage, created on a physical hw system (located off- or on-prem).
It uses sw instead of a physical computer to run programs & deploy apps.
traceroute tracks the route packets take across an IP network on their way to a given host.
It assists you in troubleshooting nw connectivity issues from your Destination to a Remote destination by using echo packets (ICMP) to visually trace the route.
The syntax -
The cmd traceroute <x> (x here being an IP or hostname) is d most basic version & it will begin to send packets to d designated target. This result will allow u to trace d path of d packets sent from ur machine to each of d systems b/n u & ur desired destination.
Cybersecurity is a way of protecting the network, computers, and other electronic gadgets from cybercriminals. The Malicious attackers might delete, modify or leak confidential information posing a huge threat to a business or an individual.