By 1943, during the Second World War, U.S. bomber losses over Europe had become a serious operational concern, as many aircraft were being shot down. Military planners sought to improve survivability by adding armor to better protect both the aircraft and their crews. However, full armoring was not feasible; the added weight would significantly reduce range, maneuverability, and overall operational effectiveness. The challenge, therefore, was where the aircraft should be reinforced to maximize survivability without compromising performance.

Observed Damage Patterns

To address the problem, analysts examined bombers that had returned from combat missions. Damage patterns showed bullet holes concentrated on the wings and tail, while the engines and cockpit were largely undamaged.

The Initial Conclusion

On the surface, it seemed logical to reinforce the areas most frequently hit. Additional armor was therefore recommended for the wings and tail.

This conclusion, however, relied only on planes that survived and returned. Aircraft that were shot down were absent from the data, leaving the dataset incomplete and potentially misleading.

Wald’s Intervention

At this stage, the Statistical Research Group at Columbia University, of which the mathematician Abraham Wald was a member, was assisting the war effort through applied statistical analysis. Wald approached the problem by questioning the structure of the dataset rather than the measurements themselves. His concern was not where the bullet holes were recorded, but which aircraft were absent.

Vulnerability and Selection Bias

Wald’s reasoning was methodical. If returning aircraft showed numerous bullet holes in certain areas, this did not necessarily mean those areas were most vulnerable. On the contrary, it suggested that aircraft could sustain damage there and still remain airworthy. The relative absence of damage around the engines and cockpit was therefore not evidence of safety, but an indication that hits in those areas were more likely to be fatal. Wald therefore recommended reinforcing these areas, inferring that hits in those locations were more likely to result in loss.

Survivorship Bias

This analysis is now a classic example of survivorship bias: drawing conclusions from a non-representative sample of successful outcomes. In this case, visible evidence risked directing resources to areas that could already withstand damage, while missing the critical points where failure occurred.

Wald’s contribution reminds us that data must be interpreted in context: missing cases are not noise, they often reveal the most important vulnerabilities.

The principle also applies in security. The most serious threats are often invisible: undetected attacks or breaches never reported. If a security team studies only survivable or observed incidents, it risks basing strategy on incomplete evidence, like analyzing only the “returning aircraft.”

To mitigate survivorship bias, operational security relies on methods including:

  • Red Teams: Red teaming is a structured way to reduce analytical blind spots. Instead of relying only on past incidents, organizations simulate attacks that could realistically bypass current defenses. This helps identify weaknesses that may never appear in incident reports. Without this approach, organizations tend to focus on systems that have already withstood attacks, rather than examining where a successful breach could actually occur.
  • Post-Incident Reviews: Post-incident reviews often concentrate on events that are documented and recoverable, such as incidents with logs, witnesses, or formal reports. However, many security failures are never fully reported due to legal, reputational, or operational reasons. As a result, the available data may exclude the most serious or sensitive failures, leading to an incomplete understanding of risk.
  • Vulnerability Mapping and Risk Prioritization: Security frameworks frequently prioritize measures based on commonly observed threats. While practical, this can lead to the neglect of less frequent but potentially more damaging vulnerabilities. In effect, visible and repeated incidents receive the most attention, even though they may not represent the most critical points of failure. This mirrors the bomber analysis: the most dangerous weaknesses may be the ones that appear least often in the observable data.

In this sense, the areas “with no bullet holes” are not indicators of safety, but potential indicators of catastrophic weakness, points at which failure is so complete that it leaves little trace for analysis.