UCS Blog - All Things Nuclear (Nuclear Power Safety)

Nuclear Plant Risk Studies: Then and Now

Nuclear plant risk studies (also called probabilistic risk assessments) examine postulated events like earthquakes, pipe ruptures, power losses, fires, etc. and the array of safety components installed to prevent reactor core damage. Results from nuclear plant risk studies are used to prioritize inspection and testing resources–components with greater risk significance get more attention.

Nuclear plant risk studies are veritable forests of event trees and fault trees. Figure 1 illustrates a simple event tree. The initiating event (A) in this case could be something that reduces the amount of reactor cooling water like the rupture of a pipe connected to the reactor vessel. The reactor protection system (B) is designed to detect this situation and immediately shut down the reactor.

Fig. 1. (Source: Nuclear Regulatory Commission)

The event tree branches upward based on the odds of the reactor protection system successfully performing this action and downward for its failure to do so. Two emergency coolant pumps (C and D) can each provide makeup cooling water to the reactor vessel to replenish the lost inventory. Again, the event tree branches upward for the chances of the pumps successfully fulfilling this function and downward for failure.

Finally, post-accident heat removal examines the chances that reactor core cooling can be sustained following the initial response. The column on the right describes the various paths that could be taken for the initiating event. It is assumed that the initiating event happens, so each path starts with A. Paths AE, ACE, and ACD result in reactor core damage. The letters added to the initiating event letter define what additional failure(s) led to reactor core damage. Path AB leads to another event tree – the Anticipated Transient Without Scram (ATWS) event tree because the reactor protection system failed to cause the immediate shut down of the reactor and additional mitigating systems are involved.

The overall risk is determined by the sum of the odds of pathways leading to core damage. The overall risk is typically expressed something like 3.8×10-5 per reactor-year (3.8E-05 per reactor-year in scientific notation). I tend to take the reciprocal of these risk values. The 3.8E-05 per reactor-year risk, for example, becomes one reactor accident every 26,316 years—the bigger the number, the lower the risk.

Fault trees examine reasons for components like the emergency coolant pumps failing to function. The reasons might include a faulty control switch, inadequate power supply, failure of a valve in the pump’s suction pipe to open, and so on. The fault trees establish the chances of safety components successfully fulfilling their needed functions. Fault trees enable event trees to determine the likelihoods of paths moving upward for success or downward for failure.

Nuclear plant risk studies have been around a long time. For example, the Atomic Energy Commission (forerunner to today’s Nuclear Regulatory Commission and Department of Energy) completed WASH-740 in March 1957 (Fig. 2). I get a kick out of the “Theoretically Possible but Highly Improbable” phrase in its subtitle. Despite major accidents being labeled “Highly Improbable,” the AEC did not release this report publicly until after it was leaked to UCS in 1973 who then made it available. One of the first acts by the newly created Nuclear Regulatory Commission (NRC) in January 1975 was to publicly issue an update to WASH-740. WASH-1400, also called NUREG-75/014 and the Rasmussen Report, was benignly titled “Reactor Safety Study: An Assessment of Accident Risks in U.S. Commercial Nuclear Power Plants.”

Fig. 2. (Source: Atomic Energy Commission)

Nuclear plant risk studies can also be used to evaluate the significance of actual events and conditions. For example, if emergency coolant pump A were discovered to have been broken for six months, analysts can change the chances of this pump successfully fulfilling its safety function to zero and calculating how much the broken component increased the risk of reactor core damage. The risk studies would determine the chances of initiating events occurring during the six months emergency coolant pump A was disabled and the chances that backups or alternates to emergency coolant pump A stepped in to perform that safety function. The NRC uses nuclear plant risk studies to determine when to send a special inspection team to a site following an event or discovery and to characterize the severity level (i.e., green, white, yellow, or red) of violations identified by its inspectors.

Nuclear Plant Risk Studies: Then

In June 1982, the NRC released NUREG/CR-2497, “Precursors to Potential Severe Core Damage Accidents: 1969-1979, A Status Report,” that reported on the core damage risk from 52 significant events during that 11-year period. The events included the March 1979 meltdown of Three Mile Island Unit 2 (TMI-2), which had a core damage risk of 100%. The effort screened 19,400 licensee event reports submitted to the AEC/NRC over that period, culled out 529 event for detailed review, identified 169 accident precursors, and found 52 of them to be significant from a risk perspective. The TMI-2 event topped the list, with the March 1975 fire at Browns Ferry placing second.

The nuclear industry independently evaluated the 52 significant events reported in NUREG/CR-2497. The industry’s analyses also found the TMI-2 meltdown to have a 100% risk of meltdown, but disagreed with all the other NRC risk calculations. Of the top ten significant events, the industry’s calculated risk averaged only 11.8% of the risk calculated by the NRC. In fact, if the TMI-2 meltdown is excluded, the “closest” match was for the 1974 loss of offsite power event at Haddam Neck (CT). The industry’s calculated risk for this event was less than 7% of the NRC’s calculated risk. It goes without saying (but not without typing) that the industry never, ever calculated a risk to be greater than the NRC’s calculation. The industry calculated the risk from the Browns Ferry fire to be less than 1 percent of the risk determined by the NRC—in other words, the NRC’s risk was “only” about 100 times higher than the industry’s risk for this event.

Fig. 3. Based on figures from June 1982 NRC report. (Source: Union of Concerned Scientists)

Bridging the Risk Gap?

The risk gap from that era can be readily attributed to the immaturity of the risk models and the paucity of data. In the decades since these early risk studies, the risk models have become more sophisticated and the volume of operating experience has grown exponentially.

For example, the NRC issued Generic Letter 88-20, “Individual Plant Examination for Severe Accident Vulnerabilities.” In response, owners developed plant-specific risk studies. The NRC issued documents like NUREG/CR-2815, “Probabilistic Safety Analysis Procedures Guide,” to convey its expectations for risk models. And the NRC issued a suite of guidance documents like Regulatory Guide 1.174, “An Approach for Using Probabilistic Risk Assessment in Risk-Informed Decision on Plant-Specific Changes to the Licensing Basis.” This is but a tiny sampling of the many documents issued by the NRC about how to conduct nuclear plant risk studies—guidance that simply was not available when the early risk studies were performed.

Complementing the maturation of nuclear plant risk studies is the massive expansion of available data on component performance and human reliability. Event trees begin with initiating events—the NRC has extensively sliced and diced initiating event frequencies. Fault trees focus on performance on the component and system level, so the NRC has collected and published extensive operating experience on component performance and system reliability. And the NRC compiled data on reactor operating times to be able to develop failure rates from the component and system data.

Given the sophistication of current risk models compared to the first generation risk studies and the fuller libraries of operating reactor information, you would probably think that the gap between risks calculated by industry and NRC has narrowed significantly.

Except for being absolutely wrong, you would be entirely right.

Nuclear Plant Risk Studies: Now

Since 2000, the NRC has used nuclear plant risk studies to establish the significance of violations of regulatory requirements, with the results determining whether a green, white, yellow, or red finding gets issued. UCS examined ten of the yellow and red findings determined by the NRC since 2000. The “closest” match between NRC and industry risk assessment was for the 2005 violation at Palo Verde (AZ) where workers routinely emptied water from the suction pipes for emergency core cooling pumps. The industry’s calculated risk for that event was 50% (half) of the NRC’s calculated risk, meaning that the NRC viewed this risk as double that of the industry’s view. And that was the closest that the risk viewpoints came. Of these ten significant violations, the industry’s calculated risk averaged only 12.7% of the risk calculated by the NRC. In other words, the risk gap narrowed only a smidgen over the decades.

Fig. 4. Ratios for events after 2000. (Source: Union of Concerned Scientists)

Risk-Deformed Regulation?

For decades, the NRC has consistently calculated nuclear plant risks to be about 10 time greater than the risks calculated by industry. Nuclear plant risk studies are analytical tools whose results inform safety decision-making. Speedometers, thermometers, and scales are also analytical tools whose results inform safety decision-making. But a speedometer reading one-tenth of the speed recorded by a traffic cop’s radar gun, or a thermometer showing a child to have a temperature one-tenth of her actual temperature, or a scale measuring one-tenth of the actual amount of chemical to be mixed into a prescription pill are unreliable tools that could not continue to be used to make responsible safety decisions.

Yet the NRC and the nuclear industry continue to use risk studies that clearly have significantly different scales.

On May 6, 1975, NRC Technical Advisor Stephen H. Hanauer wrote a memo to Guy A. Arlotto, the NRC’s Assistant Director for Safety and Materials Protection Standards. The second paragraph of this two-paragraph memo expressed Dr. Hanauer’s candid view of nuclear plant risk studies: “You can make probabilistic numbers prove anything, by which I mean that probabilistic numbers ‘prove’ nothing.”

Oddly enough, the chronic risk gap has proven the late Dr. Hanauer totally correct in his assessment of the value of nuclear plant risk studies. When risk models permit users to derive results that don’t reside in the same zip code yet alone the same ball park, the results prove nothing.

The NRC must close the risk gap, or jettison the process that proves nothing about risks.

Tennessee Valley Authority’s Nuclear Safety Culture Déjà vu

The Nuclear Regulatory Commission (NRC) issued a Confirmatory Order to the Tennessee Valley Authority (TVA) on July 27, 2017.  An NRC team inspecting the Watts Bar Nuclear Plant in fall 2016 determined that TVA failed to comply with elements of another Confirmatory Order that NRC had issued to TVA on December 22, 2009. Specifically, the 2009 Confirmatory Action required TVA to implement measures at all its nuclear plant sites (i.e., Watts Bar and Sequoyah in Tennessee and Browns Ferry in Alabama) to ensure that adverse employment actions against workers conformed to the NRC’s employee protection regulations and whether the actions could negatively impact the safety conscious work environment. The NRC inspection team determined that TVA was not implementing several of the ordered measures at Watts Bar.

To be fair to TVA, the agency did indeed develop the procedures to ensure adverse employee actions did not violate NRC’s employee protection regulations.

To be fair to NRC, its inspectors found that TVA senior management simply did not use those procedures when taking adverse employee action against several TVA employees and contractors.

To say that TVA has a nuclear safety culture problem is like saying the sun is hot.

After determining that TVA failed to implement mandated in its December 2009 Confirmatory Order, the NRC issued another Confirmatory Order to TVA in July 2017.

How many Confirmatory Orders it will take to get TVA to establish and sustain proper nuclear safety cultures at its nuclear power plants?

I don’t know. But at least we are now one Confirmatory Order closer to that magic number. Perhaps before too many more years roll by, workers at Watts Bar, Sequoyah, and Browns Ferry will actually be protected the way they are supposed to be by NRC’s regulations.

Broken Valve in Emergency System at LaSalle Nuclear Plant

An NRC Special Inspection Team (SIT) conducted an inspection at the LaSalle Nuclear Plant this spring to investigate the cause of a valve’s failure and assess the effectiveness of the corrective actions taken.

The two units at Exelon Generation Company’s LaSalle County nuclear plant about 11 miles southeast of Ottawa, Illinois are boiling water reactors (BWRs) that began operating in the early 1980s. While most of the BWRs operating in the U.S. are BWR/4’s with Mark I containment designs, the “newer” LaSalle Units feature BWR/5’s with Mark II containment designs. The key distinction for this commentary is that while BWR/4’s employ steam-driven high pressure coolant injection (HPCI) systems to provide makeup cooling water to the reactor core in event that a small pipe connected to the reactor vessel breaks, the BWR/5’s use a motor-driven high pressure core spray (HPCS) system for this safety role.

The Event

Workers attempted to refill the Unit 2 high pressure core spray (HPCS) system with water on February 11, 2017, following maintenance and testing of the system. The Unit 2 reactor was shut down for a refueling outage at the time and this downtime was used to inspect emergency systems, like the HPCS system.

The HPCS system is normally in standby mode during reactor operation. The system features one motor-driven pump that supplies a design makeup flow rate of 7,000 gallons per minute to the reactor vessel. The HPCS pump draws water from the suppression pool inside containment. In event that a small-diameter pipe connected to the reactor vessel broke, cooling water would leak out but the pressure inside the reactor vessel would remain too high for the array of low-pressure emergency systems (i.e., the residual heat removal and low pressure core spray pumps) to function. Water pouring from the broken pipe ends drains to the suppression pool for re-use. The motor-driven HPCS pump can be powered from the offsite electrical grid when it is available or from an onsite emergency diesel generator when the grid is unavailable.

Fig. 1(Source: Nuclear Regulatory Commission)

Workers were unable to fill the piping between the HPCS injection valve (1E22-F004) and the reactor vessel. They discovered that the disc had separated from the stem of this double disc gate valve manufactured by Anchor Darling and blocked the flow path for filling the piping. The HPCS injection valve is a normally closed motor-operated valve that opens when the HPCS system is actuated to provide a pathway for makeup water to reach the reactor vessel. The motor applies torque that rotates a screw-like stem to raise (open) or lower (close) the disc in the valve. When fully lowered, the disc blocks flow through the valve. When the disc is fully raised, flow through the valve is unobstructed. Because the disc became separated from the stem in the fully lowered position, the motor might rotate the stem as if to raise the disc, but the disc would not budge.

Fig. 2 (click to enlarge) (Source: Nuclear Regulatory Commission)

Workers took a picture of the separated double disc after the valve’s bonnet (casing) was removed (Fig. 3). The bottom edge of the stem appears at the top center of the picture. The two discs and the guides they travel along (when connected to the stem) can be seen.

Fig. 3 (Source: Nuclear Regulatory Commission)

Workers replaced the internals of the HPCS injection valve with parts redesigned by the vendor and restated Unit 2.

Background

The Tennessee Valley Authority submitted a report under 10 CFR Part 21 to the NRC in January 2013 about a defect in an Anchor Darling double disc gate valve in the high pressure coolant injection system at their Browns Ferry nuclear plant. The following month, the valve’s vendor submitted a 10 CFR Part 21 report to the NRC about a design issue with Anchor Darling double disc gate valves that could result in the stem separating from the discs.

In April 2013, the Boiling Water Reactor Owners’ Group issued a report to its members about the Part 21 reports and recommended methods for monitoring the affected valves for operability. The recommendations included diagnostic testing and monitoring the rotation of the stems. Workers performed the recommended diagnostic testing of HPCS injection valve 2E22-F004 at LaSalle during 2015 without identifying any performance issues. Workers performed maintenance and testing of HPCS injection valve 2E22-F004 on February 8, 2017, using the stem rotation monitoring guidance.

In April 2016, the Boiling Water Reactor Owners’ Group revised their report based on information received from one plant owner. Workers had disassembled 26 potentially susceptible Anchor Darling double disc gate valves and found problems with 24 of them.

In April 2017, Exelon notified the NRC about the failure of HPCS injection valve 2E22-F004 due to separation of the stem from the discs. Within two weeks, a Special Inspection Team (SIT) chartered by the NRC arrived at LaSalle to investigate the cause of the valve’s failure and assess the effectiveness of the corrective actions taken.

SIT Findings and Observations

The SIT reviewed Exelon’s evaluation of the failure mode for the Unit 2 HPCS injection valve. The SIT agreed that a part within the valve had broken due to excessive force. The broken part allowed the stem-to-disc connection to become steadily more misaligned until eventually the discs separated from the stem. The vender redesigned the valve’s internals to correct the problem.

Exelon notified the NRC on June 2, 2017, of its plan to correct 16 other safety-related and important to safety Anchor Darling double disc gate valves that may be susceptible to this failure mechanism during the next refueling outages of the two LaSalle units.

The SIT reviewed Exelon’s justifications for waiting to fix these 16 valves. The SIT found the justifications to be reasonable with one exception—the HCPS injection valve on Unit 1. Exelon had estimated the number of times that the Unit 1 and the Unit 2 HPCS injection valves had been cycled. The Unit 2 valve was original equipment installed in the early 1980s while the Unit 1 valve had been replaced in 1987 following damage due to another cause. Exelon contended that the greater number of strokes by the Unit 2 valve explained its failure and justified waiting until the next refueling outage to address the Unit 1 valve.

Citing factors like unknown pre-operational testing differences between the units, slight design differences of unknown consequence, uncertain material strength properties, and uncertain differences in stem-to-wedge thread wear, the SIT concluded “that it was a matter of “when” and not “if” the 1E22-F004 valve would fail in the future if it had not already failed.” In other words, the SIT did not buy the delayed look at the Unit 1 valve.

Exelon shut down LaSalle Unit 1 on June 22, 2017, to replace the internals of HPCS injection valve 1E22-F004.

NRC Sanctions

The SIT identified a violation of Criterion III, Design Control, of Appendix B to 10 CFR Part 50 associated with the torque values developed by Exelon for the motors of HPCS injection valves 1E22-F004 and 2E22-F004. Exelon assumed the valves’ stem to be the weak link and established motor torque values that would not over-stress the stem. But the weak link turned out to be another internal part. The motor torque values applied by Exelon over-stressed this part, causing it to break and the discs to separate from the stem.

The NRC determined that the violation to be a Severity Level III Violation (out of a four-level system with Level I being most serious) based on the failure of the valves preventing the HPCS system from performing its safety function.

But the NRC exercised enforcement discretion per its Enforcement Policy and did not issue the violation. The NRC determined that the valve design defect was too subtle for Exelon to have reasonably foreseen and corrected before the Unit 2 valve’s failure.

UCS Perspective

Exelon looked pretty good in this event. The NRC’s SIT documented that Exelon was aware of the Part 21 reports made by the Tennessee Valley Authority and the valve’s vendor in 2013. That they were unable to use this awareness to identify and correct the problems with the Unit 2 HPCS injection valve is really not a poor reflection on their performance. After all, they performed the measures recommended by the Boiling Water Reactor Owners’ Group for the two Part 21 reports. The shortcoming was in that guidance, not in Exelon’s application of it.

The only blemish on Exelon’s handling of the matter was its weak justification for operating Unit 1 until its next scheduled refueling outage before checking whether its HPCS injection valve was damaged or broken. But the NRC’s SIT helped Exelon decide to hasten that plan with the result that Unit 1 was shut down in June 2017 to replace the susceptible Unit 1 valve.

The NRC looked really good in this event. Not only did the NRC steer Exelon to a safer place regarding LaSalle Unit 1, but the NRC also prodded the entire industry to get this matter resolved without undue delay. The NRC issued Information Notice 2017-03 to plant owners on June 15, 2017, about the Anchor Darling double disc gate valve design defects and the limitations in the guidance for monitoring valve performance. The NRC conducted a series of public meetings with industry and valve vendor representatives regarding the problem and its solution. Among the outcomes from these interactions is a resolution plan by the industry enumerating a number of steps with target deadlines no later than December 31, 2017, and a survey of where Anchor Darling double disc gate valves are used in U.S. nuclear power plants. The survey revealed about 700 Anchor Darling double disc gate valves (AD DDGVs) used in U.S. nuclear power plants, but only 9 valves characterized as High/Medium risk, multi-stoke valves. (Many valves are single stroke in that their safety function is to close, if open, or open, if closed. Multi-stroke valves may be called open to open and close, perhaps several times, in fulfilling their safety function.)

Fig. 4 (Source: Nuclear Energy Institute)

There’s still time for the industry to snatch defeat from the jaws of victory, but the NRC seems poised to see this matter to a timely and effective outcome.

Florida’s Nuclear Plants and Hurricane Irma

Will Florida’s two nuclear plants, Turkey Point and St. Lucie, be able to withstand Hurricane Irma?

Florida governor Rick Scott, the utility Florida Power & Light (FP&L), and the US Nuclear Regulatory Commission (NRC) have all provided assurances that they will. But we are about to witness a giant experiment in the effectiveness of the NRC’s strategy for protecting nuclear plants from natural disasters.

A review of the plans that the two plants have developed to protect against extreme natural disasters leaves plenty of room for concern. These plans were developed in response to new requirements that the NRC imposed in the years following the March 2011 Fukushima nuclear plant disaster in Japan. A prolonged loss of all electrical power—caused by an earthquake and subsequent tsunami that flooded the Fukushima site—resulted in three nuclear reactor meltdowns and a large release of radioactivity to the environment. (Even when reactors are shut down, they normally rely on electrical power to provide cooling water to the fuel in the cores and the spent fuel in storage pools, which remain hot.)

Fukushima made it clear that nuclear plants around the world were not sufficiently protected against natural disasters. Subsequently, the NRC imposed new requirements on US nuclear plants to develop strategies to cope with prolonged electric blackouts.

However, these new requirements were heavily influenced by pressure from a cost-conscious nuclear industry. As a result, they were limited in scope.

Moreover, these requirements are based on numerous assumptions that may not prove valid in the face of massive and powerful storms. In effect, the NRC is betting that no nuclear plant will experience conditions that don’t conform to these assumptions. Soon, the nation will find out whether the NRC wins or loses the next round with Mother Nature: Hurricane Irma.

The Plan for Turkey Point

Turkey Point Nuclear Plant (Source: NARA)

FP&L’s plan for Turkey Point, 25 miles south of Miami, contains many questionable assumptions.

To give just one example, its strategy to keep the two reactors cool if there is a total loss of electrical power (both offsite and on-site back-up power) includes initially drawing water from two water supply tanks (so-called condensate storage tanks), running the water through the reactors’ steam generators, and dumping the steam that is produced by the heat of the nuclear fuel in the reactor cores into the atmosphere (when the plant is operating, the steam is used to generate electricity).

But here’s the rub: These tanks were not designed to withstand objects thrown about by the high winds occurring during tornadoes or hurricanes.

Nevertheless, FP&L assumed—and the NRC accepted—that at least one of the two tanks on site would withstand any hurricane. They argued that this was a reasonable assumption because the two tanks are separated by a few hundred feet and there are structures between them. There seems to be a degree of wishful thinking at work here. If both tanks were damaged, the challenges in keeping the cores cool would be far greater.

Also, to deal with prolonged station blackouts—when both offsite and onsite back-up power is lost—the Turkey Point plan assumes that offsite assistance would be available after five days. The nuclear industry has set up two “National SAFER Response Centers,” one in Memphis, Tennessee and the other in Phoenix, Arizona. Each one contains additional emergency equipment and supplies to supplement those that each reactor owner is required to have on site. The NRC requires that every plant in the country have an agreement with one of the SAFER centers to provide equipment and assistance should it be needed.

But the functioning of this system depends on the ability of the SAFER centers to deliver the equipment in a timely manner, which might not be possible if there were a widespread and prolonged natural disaster.

Turkey Point’s plan requires that deliveries from the Memphis SAFER center be shipped to Miami International Airport and then hauled (if the roads are clear) to the site or to the Homestead Air Reserve Base and taken to the site via helicopter. But it doesn’t take too great a stretch of the imagination, given the potential impact of a massive storm like Irma, to see where this plan could go badly wrong. And looking at the current track of the storm, the Memphis SAFER center itself could well be in its path, causing problems at the shipping end as well as the receiving end.

Even if the Turkey Point plan were effective, it is not clear how much of it has been put into place on the ground yet. At the end of June, the plant reported to the NRC that it needed to make ten modifications to address the risk of storm surges that could exceed the flood level that the plant was originally designed to withstand.

But it isn’t clear how many of those modifications have been completed yet. And the NRC’s first inspection of the post-Fukushima measures at Turkey Point is not even scheduled until March 2018. So at this time all the public has to rely on is an assumption that FP&L has implemented the plan completely and correctly.

With one assumption piled upon another, it is very hard for observers to assess how prepared Turkey Point really is to deal with superstorms. Hopefully, the plant will pass the Irma test, but the NRC will need to reevaluate whether its new requirements can adequately address the potential for more severe storms in the future.

NRC’s Decision Making: 18 Reasons Why You Are Right, but Wrong

As described in a prior blog post, the Unit 3 reactor at the Palo Verde Generating Station had one of two emergency diesel generators (EDGs) explode during a test run. The license issued by the Nuclear Regulatory Commission (NRC) allowed the reactor to remain running for up to 10 days with one EDG unavailable. Fixing the heavily damaged EDG would require far longer than 10 days, so the plant’s owner submitted requests to the NRC for its permission to run the reactor for up to 21 days and then up to 62 days with only one EDG available.

As described in a followup blog post, NRC staffer(s) filed formal opposition to the agency’s approval of the owner’s requests by initiating Differing Professional Opinions (DPOs). Under the NRC’s DPO process, a DPO panel is formed to review the issue and to document its findings and conclusions in a report to the NRC senior manager who makes the final decisions. In this matter, that individual was the Director of the Office of Nuclear Reactor Regulation (NRR). The DPO originator(s) can nominate one individual to serve on the DPO panel. (The DPO process requires a minimum of three persons on the DPO panel, ensuring that the panel won’t have a majority of members sympathetic to the originator(s)’s concerns.)

The DPO panel issued its report on June 5, 2017, and the NRR Director issued his decision on June 28, 2017.  The NRC made the DPOs, the DPO panel report, and the NRR Director’s decision publicly available on July 21, 2017.

The DPO Originator

Troy Pruett originated both of the DPOs in the Palo Verde EDG case. Mr. Pruett is the Director of the Division of Reactor Projects in NRC Region IV. Among other things, Mr. Pruett oversees the NRC’s resident inspectors at all of the nuclear power plants operating in Region IV, including Palo Verde. Mr. Pruett has worked for the NRC for nearly a quarter century—long enough to know the agency’s regulations and procedures intended to protect nuclear plant workers and the American public inside and out (Fig. 1).

Fig. 1 (Source: Nuclear Regulatory Commission)

The DPO Originator’s Position

In his DPOs, Mr. Pruett contended that the owner’s requests to operate Palo Verde Unit 3 for up to 21 and later up to 62 days with one emergency diesel generator unavailable should not have been approved because they departed from the agency’s regulations, procedures, and practices.

The DPO Panel’s Conclusion

Quoting from their report: “The DPO Panel was not unanimous in concluding that Palo Verde License Amendments 199 and 200 should have been approved by the staff.”

One of Mr. Pruett’s candidates was appointed to the DPO Panel. NRC management selected three other members, assuring they’d have a majority. And sure enough, a majority of the NRC management-appointed panel sided with NRC management.

The DPO Panel’s 18 Observations

The DPO Panel’s report contained 18 Observations about the processes used (and not used) en route to the Palo Verde EDG approvals:

(1) The owner submitted two licensing requests to the NRC: one to operate for up to 21 days and the second to operate for up to 62 days with one EDG unavailable. The overwhelming majority of the hundreds of licensing requests submitted to the NRC each year are not bifurcated in this way and the agency’s procedures for reviewing licensing requests do not address such “split” requests. The DPO panel recommended that additional guidance be provided in LIC-101, the NRC’s procedure for handling such reviews, if the practice becomes more frequent.

(2) The DPO Panel noted that the staff’s reasoning for the two-step approach could have been made clearer in the first approval in the interest of transparently providing a complete record of the staff’s decision basis to the public.

(3) The DPO Panel observed that there may be opportunities to more effectively communicate with the public, including the use of less formal communications tools, during emergency requests, and suggests that guidance and training be considered in this area.

 (4) The second approval issued by the NRC staff contained an explicit requirement to shut down Unit 3 if workers found the cause of the EDG’s failure could also disable the surviving EDG. The DPO Panel noted that no formal regulatory commitment existed in the first approval. During interviews, the NRC staff was not able to provide a sufficient basis as to why a similar condition was not included.

(5) The DPO Panel observed that the Safety Evaluation issued by the NRC staff in support of the first approval lacked sufficient documentation to objectively identify the staff’s decision basis in several key areas, including how the potential for common cause failure of the surviving EDG was evaluated and the basis for a 21 day EDG outage time. In other words, the NRC staff failed to ask and answer all the relevant safety questions.

(6) The DPO Panel concluded that the use of a zero test and maintenance assumption (i.e., no other safety equipment would fail or be unavailable while the EDG was broken) in the owner’s probabilistic risk assessment (PRA) model was not consistent with Regulatory Guide 1.177 guidance and the regulatory commitment put in place for the second approval for the conduct of routine maintenance and surveillance was not consistent with PRA assumptions.

In other words, when the EDG was unbroken, the owner’s risk assessment assumed that there was a small, but non-zero, chance that the highly reliable emergency equipment would not perform needed safety functions during an accident. But when evaluating the risk during the 62 days the reactor would operate with a broken EDG, the owner’s risk assessment assumed that all emergency equipment would function perfectly. The DPO Panel found this assumption unrealistic, non-conservative, and contrary to longstanding NRC expectations.

(7) Additional guidance should be evaluated with respect to defense-in-depth, the adequacy of long duration equipment outage periods, and whether there should be a backstop (i.e., maximum outage period).

(8) The DPO Panel concluded that the Branch Technical Position 8-8 guidance was not strictly adhered to for the two approvals. The DPO Panel recommended that deviations from established guidance should be documented and justified. The Branch Technical Position explicitly stated that the NRC staff should not even review a request to operate for longer than 14 days with one EDG unavailable; in this case, the NRC staff not only reviewed such a request, they approved it without explaining why they dismissed the 14-day maximum duration.

(9) The DPO Panel confirmed that the NRC’s safety evaluation supporting the first approval did not include an independent verification of the owner’s risk evaluations.

(10) The DPO Panel identified that the second approval used the three risk-informed tiered review approach outlined in Regulatory Guide 1.177. The DPO Panel pointed out this approach was inconsistent with Standard Review Plan 16.1 guidance, which states that Regulatory Guide 1.177 only applied to permanent (as opposed to temporary or “one-time”) changes. In other words, the NRC staff used an approach not allowed by the agency’s procedures.

(11) The DPO Panel found no discernible differences between the DC Cook request for a 65-day EDG outage in June 2015 and the Palo Verde request. However, the staff appears to have arrived at entirely different conclusions, based upon different interpretations of the deterministic guidance of Branch Technical Position 8-8. Cook’s owner sought the NRC’s permission to operate Unit 1 for up to 65 days with one of two EDGs unavailable. The NRC said no to Cook’s owner and yes to Palo Verde’s owner, citing Branch Technical Position 8-8 for each of the entirely opposite decisions.

(12) In the DPO Panel’s opinion, the Palo Verde risk evaluation warranted closer scrutiny. However, interviews of the NRC staff identified that there is no guidance for when to use the agency’s SPAR models for independent verification and it appears to be at the discretion of the reviewer(s).

(13) Section 4.2 of the NRC’s procedure for reviewing licensing requests, LIC-101, states that, “Decisions to not apply specific precedents, especially precedents cited by a licensee, should be clearly explained in the SE [NRC’s Safety Evaluation] (to avoid the appearance of being arbitrary and/or inconsistent).” The DPO Panel observed that neither of the Safety Evaluations prepared by the NRC staff for the two approvals addressed the licensee’s referenced precedents. In other words, the NRC staff did not follow the procedure they purportedly used to make the approvals.

(14) The DPO Panel found that both of the Safety Evaluations prepared by the NRC staff for its approvals included Branch Technical Position 8.8 in the list of regulatory guidance documents reviewed. The Safety Evaluations stated that Branch Technical Position 8.8 required more defense-in-depth for station blackout scenarios than for loss of coolant accident scenarios because of a higher likelihood of occurrence. But the DPO Panel found no such statement or implication about design basis accident likelihoods in the Branch Technical Position. In other words, the NRC staff departed from the regulatory guidance document it purportedly used to justify the approvals.

(15) The DPO Panel determined that there is no established guidance for how NRC staff should judge the adequacy of risk evaluations provided by plant owners. The good news is that the NRC staff cannot depart from non-existent guidance; the bad news is that the NRC staff can, and has, wandered all over the map since it lacks proper directions.

(16) The DPO Panel identified a lack of clarity in the existing review guidance and related inconsistencies in the understanding between the NRC departments regarding who is responsible for reviewing what in licensing requests.

(17) The DPO Panel recommended additional guidance be developed for the NRC staff when reviewing requests for extended periods of safety equipment unavailability.

(18) The DPO Panel recommended that a lessons learned review be conducted after significant or first of a kind licensing actions to determine if the action should be used as future precedent and/or whether there should be specific attributes identified that future staff should evaluate before using the precedent.

Grading on a (Mobius) Curve

If you read these observations, and the more voluminous supporting text in the report, before reading the conclusion, you’d likely think that the entire panel agreed with Mr. Pruett.

After all, Mr. Pruett contended that the requests departed from regulations and the DPO Panel’s Observations 5, 6, and 14 confirm that contention and several others support it.

Mr. Pruett contended that the requests departed from the agency’s procedures. The DPO Panel’s Observations 1, 8, and 13 confirm that contention and several others support it.

Mr. Pruett contended that the requests departed from the agency’s practices. The DPO Panel’s Observations 9, 10, and 11 confirm that contention and several others support it.

But nooooo. The DPO Panel disagreed with Mr. Pruett.

You might ask why the DPO Panel could possibly have disagreed with Mr. Pruett.

If you do ask and someone gives you a straight answer, please forward it to me. I’ve monitored the NRC for nearly two decades and I cannot fathom how the DPO Panel could assemble so many reasons why Mr. Pruett was right, and yet conclude he was wrong.

It’s like a 19 chapter mystery novel with the first 18 chapters describing how the upstairs maid committed crime after crime only to have the butler—mentioned for the first time—arrested for the crimes.

Perhaps that explains it: the DPO Panel report is an intriguing work of fiction. Or maybe it only needed a non-fictional final chapter.

Marijuana and Nuclear Power Plants

The Nuclear Regulatory Commission (NRC) adopted regulations in the mid-1980s seeking to ensure that nuclear power plant workers are fit for duty. The NRC’s regulations contained provisions seeking to verify that workers were trustworthy and reliable as well as measures intended to prevent workers from being impaired on duty. The former measures included background checks before workers could gain access to the plant while the latter components included drug and alcohol testing.

The regulations require that nuclear plant owners test workers for marijuana and alcohol use at the time of hiring, randomly thereafter, and for cause when circumstances warrant it. In 2014, marijuana use was the #1 reason for positive drug and alcohol tests by contractors and vendors and was the #2 reasons for positive tests by nuclear plant employees. Positive tests for alcohol are the #1 reason for positive tests by employees and the #2 reason for positive tests by contractors and vendors. A positive test may not be a career killer, but it is often a career crimper.

Fig. 1 (Source: Nuclear Regulatory Commission)

Alcohol can be legally purchased and consumed in all 50 states. So, mere detection of having used alcohol will not result in a positive test. But detection of a blood alcohol concentration of 0.04 percent or higher yields a positive test. People have different metabolisms and alcoholic beverages come in different sizes, but that threshold is often equated to having consumed one alcoholic beverage within five hours of the test. Similar to the reason that states require motorists to not drive under the influence of alcohol (i.e., don’t drink and drive), the NRC’s regulations seek to control alcohol consumption by workers (i.e, don’t drink and operate nuclear plants.)

Unlike the reason for the alcohol controls, the NRC’s ban on marijuana use is not because it might make them more likely to make mistakes or otherwise impair their performance, thus reducing nuclear safety levels. The NRC banned marijuana use because at the time marijuana was an illegal substance in all 50 states and its criminal use meant that workers fell short of the trustworthiness and reliability standards in the fitness for duty regulation. Since the NRC adopted its regulation, 8 states have legalized recreational use of marijuana and another 12 states have decriminalized its use.

Fig. 2 (Source: NORML)

The NRC recognized that marijuana’s legalization creates potential problems with its fitness for duty regulation. If an individual uses marijuana in a state that has legalized or decriminalized its use but tests positive at a nuclear plant in a state where its use is not legal, is the individual sufficiently trustworthy and reliable? In the eyes of the NRC, the answer remains yes.

Fig. 3 (Source: Nuclear Regulatory Commission)

The NRC conceded that no comparable scientific basis links marijuana use to performance impairment as existed when the alcohol limits were established. But the NRC continues to consider marijuana use as indicating one lacks the trustworthiness needed to work in a nuclear power plant.

The NRC is in a hard spot on this one. Revising its regulations to eliminate marijuana as a disqualifier for working in a nuclear power plant would likely spawn news reports about the agency permitting Reefer Madness at nuclear plants. But the country’s evolving mores are undermining the basis for the NRC’s regulation.

Nuclear Plant Cyber Security

There has been considerable media coverage recently about alleged hacking into computer systems at or for U.S. nuclear power plants. The good news is that the Nuclear Regulatory Commission (NRC) and the nuclear industry are not merely reacting to this news and playing catch-up to the cyber threat. The NRC included cyber security protective measures among the regulatory requirements it imposed on the nuclear industry in the wake of 9/11. The hacking reported to date seems to have involved non-critical systems at nuclear plants as explained below.

The bad news is that there are bad people out there trying to do bad things to good people. We are better protected against cyber attacks than we were 15 years ago, but are not invulnerable to them.

Nuclear Plant Cyber Security History

The NRC has long had regulations in place requiring that nuclear plant owners take steps to protect their facilities from sabotage by a small group of intruders and/or an insider. After 9/11, the NRC issued a series of orders mandating upgrades to the security requirements. An order issued in February 2002 included measures intended to address cyber security vulnerabilities. An order issued in April 2003 established cyber attack characteristics that the NRC required owners to protect against.

The orders imposed regulatory requirements for cyber security on nuclear plant owners. To help the owners better understand the agency’s expectations for what it took to comply with the requirements, the NRC issued NUREG/CR-6847, “Cyber Security Self-Assessment Method for U.S. Nuclear Power Plants,” in October 2004; Regulatory Guide 5.71, “Cyber Security Programs for Nuclear Facilities,” in January 2010; NUREG/CR-7117, “Secure Network Design,” in June 2012; and NUREG/CR-7141, “The U.S. Nuclear Regulatory Commission’s Cyber Security Regulatory Framework for Nuclear Power Reactors,” in November 2014. In parallel, the Nuclear Energy Institute developed NEI-08-09, “Cyber Security Plan for Nuclear Power Reactors,” in April 2010 that the NRC formally endorsed as an acceptable means for conforming to the cyber security regulatory requirements.

First Step: NANA

Anyone who has read more than one report about the U.S. nuclear power industry will appreciate that NANA was a key step in the road to cyber security regulations—Need A New Acronym. The nuclear industry and its regulator need to be able to talk in public without any chance of the public following the conversation, so acronyms are essential elements of the nukespeak. Many FTEs (full-time equivalents, or NRC person-hours) went into the search for the new acronym, but the effort yielded CDA—Critical Digital Assets. It was a perfect choice. Even if one decoded the acronym, the words don’t give away much about what the heck it means.

Finding CDA Among the NCDA, CAA, and NCAA

Armed with the perfect acronym, the next step involved distinguishing CDA from non-critical digital assets (NCDA), critical analog assets (CAA), and non-critical analog assets (NCAA, sorry college sports enthusiasts). Doing so is an easy three-step process.

Step 1: Inventory the Plant’s Digital Assets

The NRC bins the digital assets at a nuclear power plant into the six categories shown in Figure 1. Security systems include the computers that control access to vital areas within the plant, sensors that detect unauthorized entries, and cameras that monitor restricted areas. Business systems include the computers that enable workers to access PDFs of procedures, manuals, and engineering reports. Emergency preparedness systems include the digital equipment used to notify offsite officials of conditions at the plant. Data acquisition systems include sensors monitoring plant parameters and the equipment relaying that information to gauges and indicators in the control room as well as to the plant process computer. Safety systems could include the equipment detecting high temperatures or smoke and automatically initiate fire suppression systems. Control systems include process controllers that govern the operation of the main turbine or regulate the rate of feedwater flow to the steam generators (pressurized water reactors) or reactor pressure vessels (boiling water reactors). The first step has owners inventorying the digital assets at their nuclear power plants.

Fig.1 (Source: Nuclear Regulatory Commission)

Step 2: Screen Out the Non-Critical Systems, Screen in the Critical Systems

Figure 2 illustrates the evaluations performed for the inventory of digital assets assembled in Step 1 to determine which systems are critical. The first decision involves whether the digital asset performs a safety, security, or emergency preparedness (SSEP) function. If not, the evaluation then determines whether the digital asset affects, supports, or protects a critical system. If the answer to any question is yes, the digital asset is a critical system. If all the answers are no, the digital asset is a non-critical system.

Fig. 2 (Source: Nuclear Regulatory Commission)

Step 3: Screen Out the NCDA, Screen in the CDA

Figure 3 illustrates the evaluations performed for the inventory of critical systems identified in Step 2 to determine which are critical digital assets. The first decision involves whether the critical system performs a safety, security, or emergency preparedness (SSEP) function. If not, the evaluation determines whether the critical system affects, supports, or protects a critical asset. If the answer to any question is yes, the critical system is a critical digital asset. If all the answers are no, the critical system is a non-critical digital asset.

Fig. 3 (Source: Nuclear Regulatory Commission)

Remaining Steps

Once the CDAs are identified, the NRC requires that owners use defense-in-depth strategies to protect workers and the public from harm caused by a cyber-based attack. The defense-in-depth protective layers are:

  • Prompt detection and response to a cyber-based attack
  • Mitigating the adverse consequences of a cyber-based attack
  • Restoring CDAs affected by a cyber-based attack
  • Correcting vulnerabilities exploited by a cyber-based attack

The Power of One (Bad Person)

The NRC instituted cyber security regulatory requirements many years ago. The NRC’s inspectors have assessed how effectively measures undertaken by plant owners conform to these requirements. Thus, the U.S. nuclear industry does not have to quickly develop protections against cyber attacks in response to recent reports of hacking and attacking. The job instead is to ensure required protections remain in place as effectively as possible.

Unfortunately, digital technology can also broaden the potential harm caused by an insider. The NRC’s security regulations have long recognized that an insider might attempt sabotage alone or in conjunction with unauthorized intruders. In what the military terms a “force multiplier,” digital technology could enable the insider to attack multiple CDAs. The insider could also supply passwords to the outside bad guys, saving them the trouble of hacking and the risk of detection.

The hacking of computer systems by outsiders made news. The mis-use of CDAs by an insider can make for grim headlines.

Cooper: Nuclear Plant Operated 89 Days with Key Safety System Impaired

The Nebraska Public Power District’s Cooper Nuclear Station about 23 miles south of Nebraska City has one boiling water reactor that began operating in the mid-1970s to add about 800 megawatts of electricity to the power grid. Workers shut down the reactor on September 24, 2016, to enter a scheduled refueling outage. That process eventually led to NRC special inspections.

Following the outage, workers reconnected the plant to the electrical grid on November 8, 2016, to begin its 30th operating cycle. During the outage, workers closed two valves that are normally open when while the reactor operates. Later during the outage, workers were directed to re-open the valves and they completed paperwork indicating the valves had been opened. But a quarterly check on February 5, 2017, revealed that both of the valves remained closed. The closed valves impaired a key safety system for 89 days until the mis-positioned valves were discovered and opened. The NRC dispatched a special inspection team to the site on March 1, 2017, to look into the causes and consequences of the improperly closed valves.

The Event

Workers shut down the reactor on September 24, 2016. The drywell head and reactor vessel head were removed to allow access to the fuel in the reactor core. By September 28, the water level had been increased to more than 21 feet above the flange where the reactor vessel head is bolted to the lower portion of the vessel. Flooding this volume—called the reactor cavity or refueling well—permits spent fuel bundles to be removed while still underwater, protecting workers from the radiation.

With the reactor shut down and so much water inventory available, the full array of emergency core cooling systems required when the reactor operates was reduced to a minimal amount. The reduction of systems required to remain in service facilitates maintenance and testing of out-of-service components.

In the late afternoon of September 29, workers removed Loop A of the Residual Heat Removal (RHR) system from service for maintenance. The RHR system is like a nuclear Swiss Army knife—it can supply cooling water for the reactor core, containment building, and suppression pool and it can provide makeup water to the reactor vessel and suppression pool. Cross-connections enable the RHR system to perform so many diverse functions. Workers open and close valves to transition from one RHR mode of operation to another.

As indicated in Figure 1, the RHR system at Cooper consisted of two subsystems called Loop A and Loop B. The two subsystems provide redundancy—only one loop need function for the necessary cooling or makeup job to be accomplished successfully.

Fig. 1 (Source: Nebraska Public Power District, Individual Plant Examination (1993))

RHR Loop A features two motor-driven pumps (labeled P-A and P-C in the figure) that can draw water from the Condensate Storage Tank (CST), suppression chamber, or reactor vessel. The pump(s) send the water through, or around, a heat exchanger (labeled HX-A). When passing through the heat exchanger, heat is conducted through the metal tube walls to be carried away by the Service Water (SW) system. The water can be sent to the reactor vessel, sprayed inside the containment building, or sent to the suppression chamber. RHR Loop B is essentially identical.

Work packages for maintenance activities include steps when applicable to open electrical breakers to de-energize components and protect workers from electrical shocks and close valves to allow isolated sections of piping to be drained of water so valves or pumps can be removed or replaced. The instructions for the RHR Loop A maintenance begun on September 29 included closing valves V-58 and V-60. These are valves that can only be opened and closed manually using handwheels. Valve V-58 is in the minimum flow line for RHR Pump A while V-60 is in the minimum flow line for RHR Pump C. These two minimum flow lines connect downstream of these manual valves and then this common line connects to a larger pipe going to the suppression chamber.

Motor-operated valve MOV-M016A in the common line automatically opens when either RHR Pump A or C is running and the pump’s flow rate is less than 2,731 gallons per minute. The large RHR pumps generate considerable heat when they are running. The minimum flow line arrangement ensures that there’s sufficient water flow through the pumps to prevent them from being damaged by overheating. MOV-M016A automatically closes when pump flow rises above 2,731 gallons per minute to prevent cooling flow or makeup flow from being diverted.

The maintenance on RHR Loop A was completed by October 7. The work instructions directed operators to reopen valves V-58 and V-60 and then seal the valves in the opened position. For these valves, sealing involved installing a chain and padlock around the handwheel so the valve could not be repositioned. The valves were sealed, but mistakenly in the closed rather than opened position. Another operator independently verified that this step in the work instruction had been completed, but failed to notice that the valves were sealed in the wrong position.

At that time during the refueling outage, RHR Loop A was not required to be operable. All of the fuel had been offloaded from the reactor core into the spent fuel pool. On October 19, workers began transferring fuel bundles back into the reactor core.

On October 20, operators declared RHR Loop A operable. Due to the closed valves in the minimum flow lines, RHR Loop A was actually inoperable, but that misalignment was not known at the time.

The plant was connected to the electrical grid on November 8 to end the refueling outage and begin the next operating cycle.

Between November 23 and 29, workers audited all sealed valves in the plant per a procedure required to be performed every quarter. Workers confirmed that valves V-58 and V-60 were sealed, but failed to notice that the valves were sealed closed instead of opened.

On February 5, 2017, workers were once again performing the quarterly audit of all sealed valves. This time, they noticed that valves V-58 and V-60 were not opened as required. They corrected the error and notified the NRC about its discovery.

The Consequences

Valves V-58 and V-60 had been improperly closed for 89 days, 12 hours, and 49 minutes. During that period, the pumps in RHR Loop A had been operated 15 times for various tests. The longest time that any pump was operated without its minimum flow line available was determined to be 2 minutes and 18 seconds. Collectively, the pumps in RHR Loop A operated for a total of 21 minutes and 28 seconds with flow less than 2,731 gallons per minute.

Running the pumps at less than “minimum” flow introduced the potential for their having been damaged by overheating. Workers undertook several steps to determine whether damage had occurred. Considerable data is collected during periodic testing of the RHR pumps (as suggested by the fact it was known that the longest a pump ran without its minimum flow line was 2 minutes and 18 seconds). Workers reviewed data such as differential pressures and vibration levels from tests over the prior two years and found that current pump performance was unchanged from performance prior to the fall 2016 refueling outage.

Workers also calculated how long it would take a RHR pump to operate before becoming damaged. They estimated that time to be 32 minutes. To double-check their work, a consulting firm was hired to independently answer the same question. The consultant concluded that it would take an hour for an RHR pump to become damaged. (The 28 minute difference between the two calculations was likely due to the workers onsite making conservative assumptions that the more detailed analysis was able to reduce. But it’s a difference without distinction—both calculations yield ample margin to the total time the RHR pumps ran.)

The testing and analysis clearly indicate that the RHR pumps were not damaged by their operating during the 89-plus days their minimum flow lines were unavailable.

The Potential Consequences  

The RHR system can perform a variety of safety functions. If the largest pipe connected to the reactor vessel were two rupture, the two pumps in either RHR Loop are designed to provide more than sufficient makeup flow to refill the reactor vessel before the reactor core overheats.

The RHR system has high capacity, low head pumps. This means the pumps supply a lot of water (many thousands of gallons each minute) but at a low pressure. The RHR pumps deliver water at roughly one-third of the normal operating pressure inside the reactor vessel. When small or medium-sized pipes ruptured, cooling water drains out but the reactor vessel pressure takes longer to drop below the point where the RHR pumps can supply makeup flow. During such an accident, the RHR pumps will automatically start but will send water through the minimum flow lines until reactor vessel pressures drops low enough. The closure of valves V-58 and V-60 could have resulted in RHR Pumps A and C being disabled by overheating about an hour into an accident.

Had RHR Pumps B and D remained available, their loss would have been inconsequential. Had RHR Pumps B and D been unavailable (such as due to failure of the emergency diesel generator that supplies them electricity), the headline could have been far worse.

NRC Sanctions

The NRC’s special inspection team identified the following two apparent violations of regulatory requirements, both classified as Green in the agency’s Green, White, Yellow and Red classification system:

  • Exceeding the allowed outage time in the operating license for RHR Loop A being inoperable. The operating license permitted Cooper to run for up to 7 days with one RHR loop unavailable, but the reactor operated far longer than that period with the mis-positioned valves.
  • Failure to implement an adequate procedure to control equipment. Workers used a procedure every quarter to check sealed valves. But the guidance in that procedure was not clear enough to ensure workers verified both that a valve was sealed and that it was in the correct position.

UCS Perspective

This near-miss illustrates the virtues, and limitations, of the defense-in-depth approach to nuclear safety.

The maintenance procedure directed operators to re-open valves V-58 and V-60 when the work on RHR Loop A was completed.

While quite explicit, that procedure step alone was not deemed reliable enough. So, the maintenance procedure required a second operator to independently verify that the valves had been re-opened.

While the backup measure was also explicit, it was not considered an absolute check. So, another procedure required each sealed valves to be verified every quarter.

It would have been good had the first quarterly check identified the mis-positioned valves.

It would have been better had the independent verifier found the mis-positioned valves.

It would have best had the operator re-opened the valves as instructed.

But because no single barrier is 100% reliable, multiple barriers are employed. In this case, the third barrier detected and corrected a problem before it could be contribute to a really bad day at the nuclear plant.

Defense-in-depth also accounts for the NRC’s levying two Green findings instead of imposing harsher sanctions. The RHR system performs many safety roles in mitigating accidents. The mis-positioned valves impaired, but did not incapacitate, one of two RHR loops. That impairment could have prevented one RHR loop from successfully performing its necessary safety function during some, but not all, credible accident scenarios. Even had the impairment taken RHR Loop A out of the game, other players on the Emergency Core Cooling System team at Cooper could have stepped in.

Had the mis-positioned valves left Cooper with a shorter list of “what ifs” that needed to line up to cause disaster or with significantly fewer options available to mitigate an accident, the NRC’s sanctions would have been more severe. The Green findings are sufficient in this case to remind Cooper’s owner, and other nuclear plant owners, of the importance of complying with safety regulations.

Accidents certainly reveal lessons that can be learned to lessen the chances of another accident. Near-misses like this one also reveal lessons of equal value, but at a cheaper price.

Turkey Point: Fire and Explosion at the Nuclear Plant

The Florida Power & Light Company’s Turkey Point Nuclear Generating Station about 20 miles south of Miami has two Westinghouse pressurized water reactors that began operating in the early 1970s. Built next to two fossil-fired generating units, Units 3 and 4 each add about 875 megawatts of nuclear-generated electricity to the power grid.

Both reactors hummed along at full power on the morning of Saturday, March 18, 2017, when problems arose.

The Event

At 11:07 am, a high energy arc flash (HEAF) in Cubicle 3AA06 of safety-related Bus 3A ignited a fire and caused an explosion. The explosion inside the small concrete-wall room (called Switchgear Room 3A) injured a worker and blew open Fire Door D070-3 into the adjacent room housing the safety-related Bus 3B (called Switchgear Room 3B.)

A second later, the Unit 3 reactor automatically tripped when Reactor Coolant Pump 3A stopped running. This motor-driven pump received its electrical power from Bus 3A. The HEAF event damaged Bus 3A, causing the reactor coolant pump to trip on under-voltage (i.e., less than the desired voltage of 4,160 volts.) The pump’s trip triggered the insertion of all control rods into the reactor core, terminating the nuclear chain reaction.

Another second later and Reactor Coolant Pumps 3B and 3C also stopped running. These motor-driven pumps received electricity from Bus 3B. The HEAF event should have been isolated to the Switchgear Room 3A, but the force of the explosion blew open the connecting fire door, allowing Bus 3B to also be affected. Reactor Coolant Pumps 3B and 3C tripped on under-frequency (i.e., alternating current electricity at too much less than the desired 60 cycles per second). Each Turkey Point unit has three Reactor Coolant Pumps that force the flow of water through the reactor core, out the reactor vessel to the steam generators where heat gets transferred to a secondary loop of water, and then back to the reactor vessel. With all three pumps turned off, the reactor core would be cooled by natural circulation. Natural circulation can remove small amounts of heat, but not larger amounts; hence, the reactor automatically shuts down when even one of its three Reactor Coolant Pumps is not running.

At shortly before 11:09 am, the operators in the control room received word about a fire in Switchgear Room 3A and the injured worker. The operators dispatched the plant’s fire brigade to the area. At 11:19 am, the operators declared an emergency due to a “Fire or Explosion Affecting the Operability of Plant Systems Required to Establish or Maintain Safe Shutdown.”

At 11:30 am, the fire brigade reported to the control room operators that there was no fire in either Switchgear Room 3A or 3B.

Complication #1

The Switchgear Building is shown on the right end of the Unit 3 turbine building. Switchgear Rooms 3A and 3B are located adjacent to each other within the Switchgear Building. The safety-related buses inside these rooms take 4,160 volt electricity from the main generator, the offsite power grid, or an EDG and supply it to safety equipment needed to protect workers and the public from transients and accidents. Buses 3A and 3B are fully redundant; either can power enough safety equipment to mitigate accidents.

Fig. 1 (Source: Nuclear Regulatory Commission)

To guard against a single file disabling both Bus 3A and Bus 3B despite their proximity, each switchgear room is designed as a 3-hour fire barrier. The floor, walls, and ceiling of the room are made from reinforced concrete. The opening between the rooms has a normally closed door with a 3-hour fire resistance rating.

Current regulatory requirements do not require the room to have blast resistant fire doors, unless the doors are within 3 feet of a potential explosive hazard. (I could give you three guesses why all the values are 3’s, but a correct guess would divulge one-third of nuclear power’s secrets.) Cubicle 3AA06 that experienced the HEAF event was 14.5 feet from the door.

Fire Door D070-3, presumably unaware that it was well outside the 3-feet danger zone, was blown open by the HEAF event. The opened door created the potential for one fire to disable Buses 3A and 3B, plunging the site into a station blackout. Fukushima reminded the world why it is best to stay out of the station blackout pool.

Complication #2

The HEAF event activated all eleven fire detectors in Switchgear Room 3A and activated both of the very early warning fire detectors in Switchgear Room 3B. Activation of these detectors sounded alarms at Fire Alarm Control Panel 3C286, which the operators acknowledged. These detectors comprise part of the plant’s fire detection and suppression systems intended to extinguish fires before they cause enough damage to undermine nuclear safety margins.

But workers failed to reset the detectors and restore them to service until 62 hours later. Bus 3B provided the only source of electricity to safety equipment after Bus 3A was damaged by the HEAF event. The plant’s fire protection program required that Switchgear Room 3B be protected by the full array of fire detectors or by a continuous fire watch (i.e., workers assigned to the area to immediately report signs of smoke or fire to the control room.) The fire detectors were out-of-service for 62 hours after the HEAF event and the continuous fire watches were put in place late.

Workers were in Switchgear Room 3B for nearly four hours after the HEAF event performing tasks like smoke removal. But a continuous fire watch was not posted after they left the area until 1:15 pm on March 19, the day following the HEAF event. And these workers were placed in Switchgear Room 3A, not in Switchgear Room 3B housing the bus that needed to be protected.

Had a fire started in Switchgear Room 3B, neither the installed fire detectors nor the human fire detectors would have alerted control room operators. The lights going out on Broadway, or whatever they call the main avenue at Turkey Point, might have been their first indication.

Complication #3

At 12:30 pm on March 18, workers informed the control room operators that the HEAF event damaged Bus 3A such that it could not be re-energized until repairs were completed. Bus 3A provided power to Reactor Coolant Pump 3A and to other safety equipment like the ventilation fan for the room containing Emergency Diesel Generator (EDG) 3A. Due to the loss of power to the room’s ventilation fan, the operators immediately declared EDG 3A inoperable.

EDGs 3A and 3B are the onsite backup sources of electrical power for safety equipment. When the reactor is operating, the equipment is powered by electricity produced by the main generator as shown by the green line in Figure 2. When the reactor is not operating, electricity from the offsite power grid flows in through transformers and Bus 3A to the equipment as indicated by the blue line in Figure 2. When under-voltage or under-frequency is detected on their respective bus, EDG 3A and 3B will automatically start and connect to the bus to supply electricity for the equipment as shown by the red line in Figure 2.

Fig. 2 (Source: Nuclear Regulatory Commission with colors added by UCS)

Very shortly after the HEAF event, EDG 3A automatically started due to under-voltage on Bus 3A. But protective relays detected a fault on Bus 3A and prevented electrical breakers from closing to connect EDG 3A to Bus 3A. EDG 3A was operating, but disconnected from Bus 3A, when the operators declared it inoperable at 12:30 pm due to loss of the ventilation fan for its room.

But the operators allowed “inoperable” EDG 3A to continue operating until 1:32 pm. Given that (a) its ventilation fan was not functioning, and (b) it was not even connected to Bus 3A, they should not have permitted this inoperable EDG from operating for over an hour.

Complication #4

A few hours before the HEAF event on Unit 3, workers removed High Head Safety Injection (HHSI) pumps 4A and 4B from service for maintenance. The HHSI pumps are designed to transfer makeup water from the Refueling Water Storage Tank (RWST) to the reactor vessel during accidents that drain cooling water from the vessel. Each unit has two HHSI pumps; only one HHSI pump needs to function in order to provide adequate reactor cooling until the pressure inside the reactor vessel drops low enough to permit the Low Head Safety Injection pumps to take over.

On the day before, workers found a small leak from a small test line downstream of the common pipe for the recirculation lines of HHSI Pumps 4A and 4B (circled in orange in Figure 3). The repair work was estimated to take 18 hours. Both pumps had to be isolated in order for workers to repair the leaking section.

Pipes cross-connect the HHSI systems for Units 3 and 4 such that HHSI Pumps 3A and 3B (circled in purple in Figure 3) could supply makeup cooling water to the Unit 4 reactor vessel when HHSI Pumps 4A and 4B were removed from service. The operating license allowed Unit 4 to continue running for up to 72 hours in this configuration.

Fig. 3 (Source: Nuclear Regulatory Commission with colors added by UCS)

Before removing HHSI Pumps 4A and 4B from service, operators took steps to protect HHSI Pumps 3A and 3B by further restricting access to the rooms housing them and posting caution signs at the electrical breakers supplying electricity to these motor-driven pumps.

But operators did not protect Buses 3A and 3B that provide power to HHSI Pumps 3A and 3B respectively. Instead, they authorized work to be performed in Switchgear Room 3A that caused the HEAF event.

The owner uses a computer program to characterize risk of actual and proposed plant operating configurations. Workers can enter components that are broken and/or out of service for maintenance and the program bins the associated risk into one of three color bands: green, yellow, and red in order of increasing risk. With only HHSI Pumps 4A and 4B out of service, the program determined the risk for Units 3 and 4 to be in the green range. After the HEAF event disabled HHSI Pump 3A, the program determined that the risk for Unit 4 increased to nearly the green/yellow threshold while the risk for Unit 3 moved solidly into the red band.

The Cause(s)

On the morning of Saturday, March 18, 2017, workers were wrapping a fire-retardant material called Thermo-Lag around electrical cabling in the room housing Bus 3A. Meshing made from carbon fibers was installed to connect sections of Thermal-Lag around the cabling for a tight fit. To minimize the amount of debris created in the room, workers cut the Thermal-Lag material to the desired lengths at a location outside the room about 15 feet away. But they cut and trimmed the carbon fiber mesh to size inside the room.

Bus 3A is essentially the nuclear-sized equivalent of a home’s breaker panel. Open the panel and one can open a breaker to stop the flow of electricity through that electrical circuit within the house. Bus 3A is a large metal cabinet. The cabinet is made up of many cubicles housing the electrical breakers controlling the supply of electricity to the bus and the flow of electricity to components powered by the bus. Because energized electrical cables and components emit heat, the metal doors of the cubicles often have louvers to let hot air escape.

The louvers also allow dust and small airborne debris (like pieces of carbon fiber) to enter the cubicles. The violence of the HEAF event (a.k.a. the explosion) destroyed some of the evidence at the scene, but carbon fiber pieces were found inside the cubicle where the HEAF occurred.  The carbon fiber was conductive, meaning that it could transport electrical current. Carbon fiber pieces inside the cubicle, according to the NRC, “may have played a significant factor in the resulting bus failure.”

Further evidence inside the cubicle revealed that the bolts for the connection of the “C” phase to the bottom of the panel had been installed backwards. These backwards bolts were the spot where high-energy electrical current flashed over, or arced, to the metal cabinet.

As odd as it seems, installing fire retardant materials intended to lessen the chances that a single fire compromises both electrical safety systems started a fire that compromised both electrical safety systems.

The Precursor Events (and LEAF)

On February 2, 2017, three electrical breakers unexpectedly tripped open while workers were cleaning up after removing and replacing thermal insulation in the new electrical equipment room.

On February 8, 2017, “A loud bang and possible flash were reported to have occurred” in the new electrical equipment room as workers were cutting and installing Thermo-Lag. Two electrical breakers unexpectedly tripped open. The equipment involved used 480 volts or less, making this a low energy arc fault (LEAF) event.

NRC Sanctions

The NRC dispatched a special inspection team to investigate the causes and corrective actions of this HEAF event. The NRC team identified the following apparent violations of regulatory requirements that the agency is processing to determine the associated severity levels of any applicable sanctions:

  • Failure to establish proper fire detection capability in the area following the HEAF event.
  • Failure to properly manage risk by allowing HHSI Pumps 4A and 4B to be removed from service and then allowing work inside the room housing Bus 3A.
  • Failure to implement effective Foreign Material Exclusion measures inside the room housing Bus 3A that enabled conductive particles to enter energized cubicles.
  • Failure to provide adequate design control in that equipment installed inside Cubicle 3AA06 did not conform to vendor drawings or engineering calculations.

UCS Perspective

This event illustrates both the lessons learned and the lessons unlearned from the fire at the Browns Ferry Nuclear Plant in Alabama that happened almost exactly 42 years earlier. The lesson learned was that a single fire could disable primary safety systems and their backups.

The NRC adopted regulations in 1980 intended to lessen the chances that one fire could wreak so much damage. The NRC found in the late 1990s that most of the nation’s nuclear power reactors, including those at Browns Ferry, did not comply with these fire protection regulations. The NRC amended its regulations in 2004 giving plant owners an alternative means for managing the fire hazard risk. Workers were installing fire protection devices at Turkey Point in March 2017 seeking to achieve compliance with the 2004 regulations because the plant never complied with the 1980 regulations.

The unlearned lesson involved sheer and utter failures to take steps after small miscues to prevent a bigger miscue from happening. The fire at Browns Ferry was started by a worker using a lit candle to check for air leaking around sealed wall penetrations. The candle’s flame ignited the highly flammable sealant material. The fire ultimately damaged cables for all the emergency core cooling systems on Unit 1and most of those systems on Unit 2. Candles had routinely been used at Browns Ferry and other nuclear power plants to check for air leaks. Small fires had been started, but had always been extinguished before causing much damage. So, the unsafe and unsound practice was continued until it very nearly caused two reactors to meltdown. Then and only then did the nuclear industry change to a method that did not stick open flames next to highly flammable materials to see if air flow caused the flames to flicker.

Workers at Turkey Point were installing fire retardant materials around cabling. They cut some material in the vicinity of its application. On two occasions in February 2017, small debris caused electrical breakers to trip open unexpectedly. But they continued the unsafe and unsound practice until it caused a fire and explosion the following month that injured a worker and risked putting the reactor into a station blackout event. Then and only then did the plant owner find a better way to cut and install the material. That must have been one of the easiest searches in nuclear history.

The NRC – Ahead of this HEAF Curveball

The NRC and its international regulatory counterparts have been concerned about HEAF events in recent years. During the past two annual Regulatory Information Conferences (RICs), the NRC conducted sessions about fire protection research that covered HEAF. For example, the 2016 RIC included presentations from the Japanese and American regulators about HEAF. These presentations included videos of HEAF events conducted under lab conditions. The 2017 RIC included presentations about HEAF by the German and American regulators. Ironically, the HEAF event at Turkey Point occurred just a few days after the 2017 RIC session.

HEAF events were not fully appreciated when regulations were developed and plants were designed and built. The cooperative international research efforts are defining HEAF events faster than could be accomplished by any country alone. The research is defining factors that affect the chances and consequences of HEAF events. For example, the research indicates that the presence of aluminum, like in cable trays holding the energized electrical cables, can be ignited during a HEAF event, significantly adding to the magnitude and duration of the event.

As HEAF research defined risk factors, the NRC has been working with nuclear industry representatives to better understand the role these factors may play across the US fleet of reactors. For example, the NRC recently obtained a list of aluminum usage around high voltage electrical equipment.

The NRC needs to understand HEAF factors as fully as practical before it can determine if additional measures are needed to manage the risk. The NRC is also collecting information about potential HEAF vulnerabilities. Collectively, these efforts should enable the NRC to identify any nuclear safety problems posed by HEAF events and to implement a triaged plan that resolves the biggest vulnerabilities sooner rather than later.

Nuclear Regulatory Commission: Contradictory Decisions Undermine Nuclear Safety

As described in a recent All Things Nuclear commentary, one of the two emergency diesel generators (EDGs) for the Unit 3 reactor at the Palo Verde Nuclear Generation Station in Arizona was severely damaged during a test run on December 15, 2016. The operating license issued by the Nuclear Regulatory Commission (NRC) allowed the reactor to continue running for up to 10 days with one EDG out of service. Because the extensive damage required far longer than the 10 days provided in the operating license to repair, the owner asked the NRC for permission to continue operating Unit 3 for up to 62 days with only one EDG available. The NRC approved that request on January 4, 2017.

The NRC’s approval contradicted four other agency decisions on virtually the same issue.

Two of the four decisions also involved the Palo Verde reactors, so it’s not a case of the underlying requirements varying. And one of the four decisions was made afterwards, so it’s not a case of the underlying requirements changing over time. UCS requested that Hubert Bell, the NRC’s Inspector General, have his office investigate these five NRC decisions to determine whether they are consistent with regulations, policies, and practices and, if not, identify gaps that the NRC staff needs to close in order to make better decisions more often in the future.

Emergency Diesel Generator Safety Role

NRC’s safety regulations, specifically General Design Criteria 34 and 35 in Appendix A to 10 CFR Part 50, require that nuclear power reactors be designed to protect the public from postulated accidents such as the rupture of the largest diameter pipe connected to the reactor vessel that causes cooling water to rapidly drain away and impedes the flow of makeup cooling water. For reliability, an array of redundant emergency pumps—most powered by electricity but a few steam-driven—are installed. Reliability also requires redundant sources of electricity for these emergency pumps. At least two transmission lines must connect the reactor to its offsite electrical power grid and at least two onsite source of backup electrical power must be provided.  Emergency diesel generators are the onsite backup power sources at every U.S. nuclear power plant except one (Oconee in South Carolina which relies on backup power from generators at a nearby hydroelectric dam).

Because, as the March 2011 earthquake in Japan demonstrated at Fukushima, all of the multiple connections to the offsite power grid could be disabled for the same reason, the NRC’s safety regulations require that postulated accidents be mitigated relying solely on emergency equipment powered from the onsite backup power sources. If electricity from the offsite power grid is available, workers are encouraged to use it. But the reactor must be designed to cope with accidents assuming that offsite power is not available.

The NRC’s safety regulations further require that reactors cope with postulated accidents assuming offsite power is not available and that one additional safety system malfunction or single operator mistake impairs the response. This single failure provision is the reason that Palo Verde and other U.S. nuclear power reactors have two or more EDGs per reactor.

Should a pipe connected to the reactor vessel break when offsite power is unavailable and a single failure disables one EDG, the remaining EDG(s) are designed to automatically startup and connect to in-plant electrical circuit within seconds. The array of motor-driven emergency pumps are then designed to automatically start and begin supplying makeup cooling water to the reactor vessel within a few more seconds. Computer studies are run to confirm that sufficient makeup flow is provided in time to prevent the reactor core from getting overheated and damaged.

Palo Verde: 62-Day EDG Outage Time Basis

In the safety evaluation issued with the January 4, 2017, amendment, the NRC staff wrote “Offsite power sources, and one train of onsite power source would continue to be available for the scenario of a loss-of-coolant-accident.” That statement contradicted NRC’s statements previously made about Palo Verde and DC Cook and subsequently made about the regulations themselves. Futhermore, this statement pretended that the regulations in General Design Criteria 34 and 35 simply do not exist.

Palo Verde: 2006 Precedent

On December 5, 2006, the NRC issued an amendment to the operating licenses for Palo Verde Units 1, 2, and 3 extending the EDG allowed outage time to 10 days from its original 72 hour limit. In the safety evaluation issued for this 2006 amendment, the NRC staff explicitly linked the reactor’s response to a loss of coolant accident with concurrent loss of offsite power:

During plant operation with both EDGs operable, if a LOOP [loss of offsite power] occurs, the ESF [engineered safeguards or emergency system] electrical loads are automatically and sequentially loaded to the EDGs in sufficient time to provide for safe reactor shutdown or to mitigate the consequences of a design-basis accident (DBA) such as a loss-of-coolant accident (LOCA).

Palo Verde: 2007 Precedent

On February 21, 2007, the NRC issued a White inspection finding for one of the EDGs on Palo Verde Unit 3 being non-functional for 18 days while the reactor operated (exceeding the 10 day allowed outage time provided by the December 2006 amendment.) The NRC determined the EDG impairment actually existed for a total of 58 days. The affected EDG was successfully tested 40 days into that period. Workers discovered a faulty part in the EDG 18 days later. The NRC assumed the EDG was non-functional between its last successful test run and replacement of the faulty part. Originally, the NRC staff estimated that the affected EDG has a 75 percent chance of successfully starting during the initial 40 days and a 0 percent chance of successfully starting during the final 18 days. Based on those assumptions, the NRC determined the risk to approach the White/Yellow inspection finding threshold. The owner contested the NRC’s preliminary assessment. The NRC’s final assessment and associated White inspection finding only considered the EDG’s unavailability during the final 18 days.

Fig. 1 (Source: NRC)

Somehow, the same NRC that estimated a risk rising to the White level for an EDG being unavailable for 18 days and a risk rising to the White/Yellow level for an additional 40 days of the EDG being impaired by 25 percent concluded that an EDG being unavailable for 62 days now had risk of Green or less. The inconsistency makes no sense. And it makes little safety.

DC Cook: 2015 Precedent

One of the two EDGs for the Unit 1 reactor at the DC Cook nuclear plant in Michigan was severely damaged during a test run on May 21, 2015. The owner applied to the NRC for a one-time amendment to the operating license to allow the reactor to continue running for up to 65 days while the EDG was repaired and restored to service.

The NRC asked the owner how the reactor would respond to a loss of coolant accident with a concurrent loss of offsite power and the single failure of the remaining EDG. In other words, the NRC asked how the reactor would comply with federal safety regulations.

The owner shut down the Unit 1 reactor and restarted it on July 29, 2015, after repairing its broken EDG.

Rulemaking: 2017 Subsequent

On January 26, 2017, the NRC staff asked their Chairman and Commissioners for permission to terminate a rulemaking effort initiated in 2008 seeking to revise federal regulations to decouple LOOP from LOCA. The NRC staff explained that their work to date had identified numerous safety issues about decoupling LOOP from LOCA. Rather than put words in the NRC’s mouth, I’ll quote from the NRC staff’s paper: “The NRC staff determined that these issues would need to be adequately addressed in order to complete a regulatory basis that could support a proposed LOOP/LOCA rulemaking. To complete a fully developed regulatory basis for the LOOP/LOCA rulemaking, the NRC staff would need to ensure that these areas of uncertainty are adequately addressed as part of the rulemaking activity.”

It’s baffling how the numerous issues that had to be resolved before the NRC staff could complete a regulatory basis for the LOOP/LOCA rulemaking would not also have to resolved before the NRC would approve running a reactor for months assuming that a LOOP/LOCA could not occur.

4 out of 5 Ain’t Safe Enough

In deciding whether a loss of offsite power event could be unlinked from a postulated loss of coolant accident, the NRC answered “no” four out of five times.

Fig. 2 (Source: UCS)

Four out of five may be enough when it comes to dentists who recommend sugarless gum, but it’s not nearly save enough when the lives of millions of Americans are at stake.

We are hopeful that the Inspector General will help the NRC do better in the future.