Event management is perhaps the only process in ITIL® that is dependant on tools and automation for its existence. It deals with triggering events at certain junctures of operations and is one of the most important processes in terms of being pro-active rather than react to situations. The importance of a process stems from the challenges it presents – which is the topic of my discussion today.
Brief Introduction to Event Management
Event management senses events and performs post mortem activities that could include triggering incident tickets, sending emails or in some cases automatically performing certain corrective actions.
As I mentioned earlier, event management is tool driven. So, this process is as good as the tool you employ? Not really. The tool sets the stage, the act has to be rehearsed and performed by professionals who know their game. In this piece, I will discuss the challenges that implementers encounter during the design and implementation phases of the process.
Types of Events
Before I go into challenges, I will touch upon different types of events that are defined in ITIL®. In reality, you can have as many types as you want. Broadly, all types of events fall into one of the three following types.
- Informational events – As the name suggests, it registers information. Examples could be delivery of an email or a user logging into a database server.
- Warning events – These events gives us heads-up that something might go wrong. Time to take action before the house is on fire! Free space on hard disk is less than 10% could be considered as a warning event.
- Exceptional events – These are events that denote anomaly. Something has gone terribly wrong, and the system is not working (or will not work) the way it should. A server or a network going down is a classic example of an exceptional event.
Challenges in Design and Implementation
Event management’s design and implementation can be a success if the implementer has insight into what he really needs and has plenty of wisdom to go with it. A junior implementer might find it easier to conquer Mt. Everest than to design the process, let alone implementation.
From my ITIL® event management implementation experience, here are some challenges:
Identifying the Scope
Organizations in general have hundreds and thousands of servers, along with a host of switches, routers and other storage devices. It is not practical to manage events on each and every system. Remember that the tools you need are a tad too expensive, and the price varies depending on the number of licenses issued. So, if you are looking at procuring hundreds and thousands of licenses, you are in for a surprise – your management or sponsor will shoot you down the moment you present the proposal. So, you need to play your cards smart, and pick and choose a few critical components that need real-time monitoring. So, how would you go about identifying critical components?
You need to interface with the configuration management process to obtain the critical component list. I love the interfacing part in ITIL®, how one process feeds into another, and gets fed by some other process. The configuration management process will provide a basis for identifying the components that really matter. I define critical components as those which directly affect a service if they were to go down. If you have too many critical components, I would advise you to take another look at your technical architecture. Basically, more critical components mean higher single points of failures.
There are a number of tools in the market that can be leveraged in event management process. The question to ask here is what kind of tool are you looking for? You need to list the expectations before you start looking for one. That’s how I would go about it. Otherwise, there are a number of trashy features that are merely gold plating rather than the real deal.
Here are some questions I would ask:
- What types of components is the monitoring tool compatible with? Example could be MOM for Microsoft servers and Nimsoft for switches.
- How easily can it interface with my ticketing tool? This is very important. You want this.
- Can I pull reports from my monitoring tools? If yes, to what extent?
- How much does each license cost me?
This is just a piecemeal of questions to ponder over. I am a consultant, so the information I pour in will generally be in parts as well.
You have now identified a practical number of servers (plus others) to monitor. The next logical question is what to monitor in a component? The wisdom of the process manager comes into play here. Every component is distinct, so the parameters to monitor will vary as well. Generally on servers, I monitor the memory, processor and hard disk utilization. The Capacity management process feeds the event management process in this design phase.
Next aspect is to set thresholds for triggering events. Recollect the three types of events I introduced earlier in the post. So, the definition phase of these events is done in this phase. At what percentage of processor utilization would you set off a warning, and at what stage would that be an exceptional event? What actions do you want the monitoring tool to perform when an exceptional event is triggered? There are a number of possibilities to mull over.
Rolling out Agents
The monitoring tool generally runs on a central server and agents will be deployed on all the client machines. Rolling out agents onto client machines is an automated task, and push approach is generally employed.
It sounds simple, but in reality, things don’t always work the way they’re supposed to. There will be some or the other firewall or a specific setting on a server or a switch that would prevent agent roll out. Successfully rolling it out on all machines is a huge challenge, and the added rider is that once it’s rolled out, it should work the way it should. Meaning, for the agent to pull information after residing on a component, certain ports will have to be opened and a number of settings must make way for the agent to not only collect the information, but to pass it onto the central server.
This activity could be a cakewalk if we are talking about a handful of machines. However it will be a pain in the wrong place if the numbers are in hundreds and more, especially when you have an environment with varying specifications, OS images, and versions.
I have thrown light on some challenges that arise during the design and implementation stages. However, there are some in the initiation and plenty more that come up during the status quo phase.
To implement this process, you would need to present a business case which would require the value on investment (VOI) and return on investment (ROI) components. For pro-active processes, it is difficult to come up with these numbers.
Once implemented, the terrain changes on a regular basis. So, you need people to man it, continually monitor the monitoring parameters (duh!) and ensure that with every change, event management is analyzed as well.
ITIL® is a Registered Trade Mark of the Cabinet Office.