Challenging 24/7/365 - Question the Status Quo
Gideon T. Rasmussen - CISSP, CISA, CISM, CFSO, SCSA

Several readers have responded to a previous article in which I recommended powering down computer rooms to prepare for inevitable emergencies. The respondents stated that they could not power down their systems due to either 24/7/365 or 99.999 percent availability requirements (often referred to as "the five nines").

Without full testing one can never be 99.999 percent certain that systems are prepared for an incident. With this in mind, users may appreciate a scheduled maintenance window (excluded from the up time meter). Here are a few points to consider:

• If the services provided by a computer room require extremely high availability, then it stands to reason that the computer room itself should not be a single point of failure.
• Note that system maintenance is critical to meeting uptime requirements. Paralysis caused by unnecessary restrictions may ultimately result in downtime.
• High availability solutions are expensive. It is prudent to know which business processes are being supported.

In the course of routine operations it is difficult to truly quantify the needs of the user community without conducting research. To determine actual needs, be sure to document the following:

• A functional description of each application
• Users of the application (e.g., customer service reps, loan officers, customers, etc.)
• Time periods when users access the application
• Business criticality of the application and its respective systems
• The financial impact of two to three hours downtime per application
• Primary and alternate contacts representing the application's user community
• The redundant infrastructure components supporting the application (e.g., mirrored disk, clustering)
• Whether or not the infrastructure meets the requirements of the application
• The redundant components of the data center (e.g., UPS, generators, separate power grids, communications lines from separate central offices)
• Whether or not the data center fulfills the requirements of the applications

Do not rely solely on verbal responses. Verify utilization through system and application logs. Create a matrix representing users' needs by application. Usage gaps begin to emerge into maintenance windows. In this stage, the need for a 24/7/365 computer room may still exist.

Throughout this process an intimate understanding of the user community's needs are gained. One common argument is "We will lose $x million in revenue." Question how that figure was generated. Is it reasonable to assume that users will not return if they are greeted by a maintenance Web page in the middle of the night?

Loss of revenue starts to build a business case for a redundant or resilient solution. If the business cannot bear two to three hours of downtime once a year, determine the best solution to meet those requirements. If only a few applications must be highly available, it may be possible to replicate them to a hot site instead of keeping the entire computer room running 24/7/365. Hot sites need not be an economic burden if a remote company-owned computer room is used. Hot sites also offer obvious disaster recovery benefits. Advise senior management of your findings so that they can make an informed decision.

Once appropriate redundancy/resiliency is in place, there should be no issues with an annual computer room shutdown. If for some reason the computer room still cannot be powered down, this exercise will have created a better understanding of the user community's needs and made it easier to take down individual systems for maintenance.

Copyright © 2005 CyberGuard Corporation All Rights Reserved.
Reprinted with Permission