There is nothing like learning from mistakes and experience. Strategy and process design is best grounded in the reality of experience. I would like to share a real story of an incident that I remember to often when looking at plans and remind myself that there is always more than the obvious to learn when things go wrong. On the surface this story has one key learning point but digging deeper revealed three.
A major trading organisation data centre loses their air con system to one of the two chillers that cool the data centre. Most people even in IT are not aware that air con is as fundamental as power in data centres. Put simply, 10,000 servers produce a lot of heat. In this case the temperature in the cabinets furthest from the working units reached 120 degrees Celsius – despite emergency fans being added to circulate the air and direct the hot air to the cooler end. 120 degrees – oven temperature! To keep temperatures down all non essential systems had to be shut down, all live BCP running was removed to reduce load and as many as possible MIS and other non core transaction processing systems had to be shut down; as the incident constitutes an emergency.