[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:CSGUK_SYSTEMS
Notice:No restrictions on keyword creation
Moderator:KERNEL::ADAMS
Created:Wed Mar 01 1989
Last Modified:Thu Nov 28 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:242
Total number of notes:1855

146.0. "PRODUCTION CLUSTER " by KERNEL::ADAMS (An RD54 beats a platinum disc !!) Fri Nov 08 1991 16:54

From:	NAME: Jeff Yates                    
	FUNC: Customer Services               
	TEL:  833                             <YATESJ AT A1_KERNEL @THESUN @UVO>
Date:	07-Nov-1991
Posted-date: 07-Nov-1991
Precedence: 1
Subject: Update on the power problems


                        Update on the power situation



A meeting was held on Wednesday 6th Nov, to review the power problems 
experienced on Friday 1st November 1991.


Attendees
Jeff Yates
Simon Lobar
Kevin Gant
Mike Coggins
Mary Challenor
John Frawley
Erika Smith
Colin Tubb
Ray Stevens

The purpose of the meeting was to review the events that took place, 
understand the oustanding issues, plan how to increase our resilience for the 
future, and decide on a more suitable approach to problem managing a future 
event if it occurs.


Current status
The UPS is currently out of circuit.  No faults have been found with it.  
Chloride want to progressively increase the loading on to the UPS, and we are 
planning for the best way of arranging this.  Also a mains physical integrity 
check is felt advisable.  Once again we are planning how to do this with
minimum disruption.

We can gain access to the Northern NICE providing we have a live ring main, 
but the queue structures are sufficiently different to make this of limited 
use.  We are working to find out how to improve the usability of the two 
systems in the event of either failing.

The phone system held up on battery power during the outage.  Battery backup 
is rated at 1.5 hours.  We are considering if this needs supplementing with 
more batteries.

Numerous people seemed to have access to plant room and the production 
computer room, making control of repair effort difficult.  The access list 
will be reviewed and any apparent anomolies reviewed with the individuals 
concerned to see if there are genuine access needs.  Any unnecessary access 
rights will be disabled.

A phone bell will be installed in the computer room


Crisis Management
A crisis was defined as the production system (Kernel cluster)
being unavailable.

In this event, where the down time is known, the IS group
should issue a tannoy message to the building via PM&S, giving
the reason for the outage, and the expected duration.

Where the downtime is not known, a Problem Manager should be
appointed by :- 

-    Either of the Service Centre Managers; or
-    Either of the Operational Managers; or
-    The Senior Manager present in either of the two
     Service Centre Management teams

The responsibilities of the Problem Manager are :-

-    To announce his or her presence to the building staff
-    To co-ordinate the repair activities
-    To provide regular updates to the building staff
-    To ensure that non-required people are kept away from the
     repair activities
-    To take any decisions that impact the Service Centre businesses.



Regards

Jeff Yates


T.RTitleUserPersonal
Name
DateLines