Challenging 24/7/365 - Question the Status Quo
Gideon T. Rasmussen - CISSP, CISA, CISM, CFSO, SCSA
Several
readers have responded to a previous article
in which I recommended powering down computer rooms to prepare
for inevitable emergencies. The respondents stated that
they could not power down their systems due to either 24/7/365
or 99.999 percent availability requirements (often referred
to as "the five nines").
Without
full testing one can never be 99.999 percent certain that
systems are prepared for an incident. With this in mind,
users may appreciate a scheduled maintenance window (excluded
from the up time meter). Here are a few points to consider:
° If
the services provided by a computer room require extremely
high availability, then it stands to reason that the computer
room itself should not be a single point of failure.
° Note that system maintenance is critical to meeting uptime
requirements. Paralysis caused by unnecessary restrictions
may ultimately result in downtime.
° High availability solutions are expensive. It is prudent
to know which business processes are being supported.
In the course of routine operations it is difficult to truly
quantify the needs of the user community without conducting
research. To determine actual needs, be sure to document
the following:
° A functional description of each application
° Users of the application (e.g., customer service reps, loan
officers, customers, etc.)
° Time periods when users access the application
° Business criticality of the application and its respective
systems
° The financial impact of two to three hours downtime per
application
° Primary and alternate contacts representing the application's
user community
° The redundant infrastructure components supporting the application
(e.g., mirrored disk, clustering)
° Whether or not the infrastructure meets the requirements
of the application
° The redundant components of the data center (e.g., UPS,
generators, separate power grids, communications lines from
separate central offices)
° Whether or not the data center fulfills the requirements
of the applications
Do not rely solely on verbal responses. Verify utilization
through system and application logs. Create a matrix representing
users' needs by application. Usage gaps begin to emerge
into maintenance windows. In this stage, the need for a
24/7/365 computer room may still exist.
Throughout
this process an intimate understanding of the user community's
needs are gained. One common argument is "We will lose
$x million in revenue." Question how that figure was
generated. Is it reasonable to assume that users will not
return if they are greeted by a maintenance Web page in
the middle of the night?
Loss
of revenue starts to build a business case for a redundant
or resilient solution. If the business cannot bear two to
three hours of downtime once a year, determine the best
solution to meet those requirements. If only a few applications
must be highly available, it may be possible to replicate
them to a hot site instead of keeping the entire computer
room running 24/7/365. Hot sites need not be an economic
burden if a remote company-owned computer room is used.
Hot sites also offer obvious disaster recovery benefits.
Advise senior management of your findings so that they can
make an informed decision.
Once
appropriate redundancy/resiliency is in place, there should
be no issues with an annual computer room shutdown. If for
some reason the computer room still cannot be powered down,
this exercise will have created a better understanding of
the user community's needs and made it easier to take down
individual systems for maintenance.
Copyright © 2005 CyberGuard Corporation All Rights Reserved.
Reprinted with Permission
|