Three Mile Island
The Three Mile Island accident was a near meltdown of a nuclear reactor in which over 30,000 gallons of radioactive water escaped. The 1979 incident almost had Chernobyl like consequences as a complete meltdown of the reactor was narrowly averted.
The litany of events that led to the reactor overheating included :
- Human error in making a repair which caused a bit of water to seep into the wrong system;
- Which caused a safety system to shut down the main pumps
- The backup pumps didn’t go on because maintenance men forgot to re-open their valves after servicing them
- An important warning light that would have clued the plant operators in on the problems wasn’t seen because a repair tag was hanging in front of the light
- An engineer sent to investigate the problem looked at the wrong gauges and didn’t uncover the problem.
The Three Mile Island accident consisted of a combination of human error, improper system design, failure of backup safety systems and bad luck. Having all these bad factors come together at the same time seems highly unlikely, but the Normal Accident Theory teaches that in a complex and highly coupled system like a nuclear power plant that an accident was inevitable.
The Normal Accident Theory
According to the Normal Accident Theory, accidents are inevitable in systems that have two particular characteristics:
- COMPLEXITY: Systems that have many components that interact with each other are complex. Nuclear power plants are complex. The financial system is complex. Mail delivery system is complex. Airline logistics is complex.
- TIGHTLY COUPLED: “Tight coupling means that components of a process are critically interdependent; they are linked with little room for error or time for recalibration or adjustment.” Source. A rocket launch is tightly coupled – each step of the process must occur in a particular order at the correct time.
With a complex system, when something goes wrong it often takes time to figure out what went wrong and take action. When a system is tightly coupled, everything is moving fast and you don’t have time to figure out problems. The combination of the two factors together create serious accidents as problems pile up and cause even more problems and there is no time to figure out what to do.
Some systems are complex but not tightly coupled. For example, a university is complex – thousands of student, hundreds of professors and classes, many majors, many classes. Getting through four years of college with a degree requires navigating a lot of complexity. However, a university is not tightly coupled. If the economics class you need to take is full, you can take it another semester. Or take it online. Or get a waiver to take a different class to fulfill the requirement. Thus, a university has a lot of slack in its system that allows for problems not to turn into major accidents. Students graduate. Another complex system that is not highly coupled is the U.S. mail delivery system. If one letter gets mis-routed, it has no effect on the rest of the system.
Some systems are tightly coupled but not complex. Preparing food is usually tightly coupled. In following a recipe, each step needs to follow the next. However, it’s not a complex system. If you mess up the steps in making your sauce, it doesn’t mean that the other courses are also ruined.
It is the interplay of complexity and tight coupling that leads to normal accidents. Beware of tightly coupled complex systems and the financial system is among the most complex and tightly coupled systems there is.
The Key Point About Normal Accidents
In order to reduce the risk of a normal accident, a complex tightly coupled system should either (a) be made less complex or (b) slack should be introduced into the system to reduce the tight coupling.
Many organizations try to avoid accidents by putting additional safety measures in place. Unfortunately, additional safety measures add to the complexity of the system! Thus, attempts to avoid accidents often make the chances or severity of an accident worse!
The problem with adding safety features to systems that are complex and tightly coupled – the very systems that by nature have the greatest risk of catastrophic failure – is that they can actually increase the likelihood of these failures because they contribute to the source of the problem: interactive complexity. By adding that many more wires, switches, meters, and items for human oversight, safety systems make the operation more opaque. The wires, switches, and sensors also interact with the system; for every step in improving safety or monitoring potential failures, they introduce their own sources of failure. If the safety mechanisms are automatic – as they almost have to be in a tightly coupled system – they serve as one more variable that can add to an unexpected result and a nonlinearity in effect.Source – Richard Bookstaber. A Demon of Our Own Design: Markets, Hedge Funds, and the Perils of Financial Innovation
Thus, in your organization consider how you try to prevent more accidents – will your safety steps create more complexity?
There are many examples of complex and tightly coupled systems where accidents have occurred:
- The blow up of the hedge fund Long-Term Capital Management nearly brought down the entire financial system due to the complexity and tight coupling of the markets in which they were operating.
- The financial crisis of 2008-09 was a perfect example of a complex and tightly coupled system. Bear Stearns, AIG, Freddie and Fannie, Lehman Brothers, all the banks, etc. were interdependent and connected. It was like dominoes falling.
- The Deepwater Horizon oil spill had the hallmarks of a normal accident. The operation of the oil rig was highly complex and tightly coupled. Safety systems created additional complexity and then failed.
- When another oil rig had an accident – the Piper Alpha – its destruction nearly caused a financial crisis due to the complex and tightly coupled insurance and re-insurance relationships found within Lloyds of London.
Accidents will happen. In creating or overseeing a system, do not assume that errors can be eliminated. They will happen! Be wary of creating additional layers of rules, procedures and safety systems because they can add more complexity and can make matters worse.
To reduce the severity of accidents, simplifying and adding slack into the system is usually the way to go.