In the past we have talked about how to check error logs and discussed looking for root cause of the issue, but how exactly do you find the root cause? Let’s look at how we can achieve this. First, let us look at the process of troubleshooting.
Identify the problem
Most of the time you will have a symptom of the problem, that is what the users will call in with. This often does not reflect what the actual issue is. The users will often exaggerate the issue or leave out details making it almost impossible to find the root cause based solely on the explanation of what went wrong. We can take an email issue as an example of this, A user calls in, “Jim never got the email I sent him” Now if we look at the description of the problem it can bring many things to mind as possible causes, from very simple to very complex. Based on this statement what do we know about the problem? Not very much, this could be as simple as outlook is offline or it could be the case of the internet being down. It could also mean that the exchange server is experiencing a back pressure issue causing all mail to stop being delivered. Since we do not know enough information to determine what is causing the problem we need to gather more info.
Gather information
During this step, we will gather as much information as we can about the problem. This can be done by asking questions, checking settings, reviewing error logs, and myriad of other diagnostic tools. We should always start with the simplest explanation, and work our way to complicated. I find the easiest way to do this is to reproduce the problem, so I can see it first hand. Let’s try to send a test message. This will tell us many things, how they are sending the message, which could indicate a user error problem. Are they using outlook? Has an outlook profile been created? Is exchange connected? Is the internet connected? all of these can be answered by simply reproducing the problem. After the problem has been reproduced, we should be pulling the error logs to review if there is an error that is not readily apparent.
Develop hypothesis
Now that we have identified the problem, and gathered information about the problem, it is time to review what we have learned. Let’s list the things we know about this issue. Outlook is installed and a profile has been made. The internet is connected and working. When trying to send emails, the email is stays in the outbox and never gets transmitted to the server. When we look at the error logs we see application logs showing outlook has been set to offline mode. Now that we have all the information in front of us we can conclude that the reason emails were not received is because they were never sent. Now this is not something you will immediately tell the user, as it would not solve his problem and would inevitably cause you more problems. So our next step is to test your hypothesis
Test hypothesis
We have developed a hypothesis, and now we need to test it. When you are testing your hypothesis it is very important to have a control conditions. Part of the control conditions should be to ensure while testing, you only change 1 thing and test, then reset that 1 thing to test again. This is to ensure that you are testing the same way every time. If they are trying to send email in outlook then use outlook to test, using outlook on the web will produce a different result and cause you to start over. In our case we will change outlook to online mode. Once this is done we need to reproduce the issue. Send a test email. If the email succeeds in sending then we have solved the problem right? Maybe, but maybe not. Let’s evaluate our results
Evaluate and revise hypothesis
Did the email send ? yes it did and outlook is now working in online mode. Great that solves one mystery. Did this fix the issue? Well we identified the issue as the recipient did not receive the email because the email was not sent. If this is the only issue then yes that has resolved it and we can close the ticket and get ready for our next exciting adventure. However, if the user still cannot receive the email we should revise our problem statement and start the process over again.
As we go through the troubleshooting process, it is important to remember that there may be multiple problems or a layered problem. For instance in our case this could have easily been a multiple problem situation. If we identified that the email was not sending but not being received, we would then have to look on the server side or wait to see if a NDR or non-deliverable report was received. This could turn out to be a typo in the email address or we could have a mail flow problem like back pressure on the exchange server.
Depending on your organization and your place in it, this could be the time you would escalate this to the next tier, or it could be time to log into your exchange server. I hope this helps shed a little light on an often overlooked process.
Happy Hunting 🙂