Monday, February 2, 2026

On Neumann's Paper "Towards Total-System Trust worthiness"- We're Building Houses of Cards

As one of my New Year's goals, I have committed to writing a few blog posts a month, and this is a good opportunity to use class work to express my rants and ramblings while getting some learning done in the process.  

Peter G. Neumann is a legend in computer security for good reason. His article "Toward Total-System Trustworthiness" names something most of us in technology leadership sense but rarely articulate: we're playing a losing game. Every patch, every wrapper, every clever workaround adds another card to a structure that was never designed to bear the weight we're placing on it.

The Southwest Airlines meltdown brought this into sharp relief. A classmate in the discussion posts that I answered to earleir today pointed out that their catastrophic failure during the winter storm wasn't a technology problem—it was an archaeology problem. Southwest had essentially wrapped an old 1990s-era scheduling system called SkySolver in newer interfaces, hoping the wrapper would compensate for foundations that were never updated for modern scale. When the storm hit, the sheer volume of data overwhelmed the underlying logic, and no amount of clever interfacing could save it.

Neumann calls this the "patch-on-patch" approach. I call it "technical debt" from days in software engineering and leading developers or product development teams, which come with haunting reminders of bug fixing and facing the music from unhappy customers, all coming due with compound interest.

Why Total-System Trustworthiness Remains Elusive

After twenty-plus years leading technology operations across global organizations, I've come to believe there are four fundamental reasons why achieving true system trustworthiness remains aspirational at best—especially when you're simultaneously responsible for keeping the lights on.

The "Less Untrustworthy" Objective: The "Less Untrustworthy" Objective. People tend to use simple binary categories when evaluating systems because they believe these systems exist in only two states: secure or insecure, and trustworthy or broken. Neumann presents this concept as a gradient that transforms the entire system. The main priority should be to minimize untrustworthy actions because we understand that humans will always fall short of achieving complete trust with one another. Medical practice requires drug interaction screening instead of achieving absolute treatment success. In systems, it means assuming your components will fail and engineering the resilience to absorb it.

The Legacy Trap: The Legacy Trap. Neumann supports a complete system reset approach, which becomes necessary when organizations attempt to add security features to their existing, outdated systems. The process of building a dormitory becomes similar to this situation because the poured foundation creates physical boundaries that determine the building's shape. The fundamental basis of computer systems is rooted in the x86 architecture and the C programming language, which were developed when cyber warfare as we know it today did not exist. A sinking foundation will never achieve perfection because any attempt to fix it through retroactive correction will be unsuccessful. We can only shore it up and resolve to build the next one differently.

Anticipating the "Space Aliens: I am reminded of a computer game I used to play :) Anticipating the "Space Aliens."Security professionals defend against space aliens using a humorous method that illustrates a basic threat modeling principle that seems ridiculous at first. Neumann explains that we cannot predict all environmental elements, including floods, earthquakes, and zero-day exploits, but we should design systems that continue to function properly during decline. A dependable system produces clear, limited failures rather than complete system breakdowns.

Designing with Humility: The most crucial element runs counter to the industry's conventional values, as it requires organizations to delay their responses, even though speed and self-assurance are typically prized. Recognizing the unresolvability of complex systems makes us more likely to implement observation systems that detect failures and compartmentalization, thereby preventing a single-floor slab crack from collapsing the entire roof structure, and to use formal methods to verify our ability to control specific system components.

The Leadership Paradox

The main thing that prevents me from sleeping is that technology leaders receive existing systems rather than designing their own. The situation forces us to take on dual responsibilities: maintaining existing systems and establishing their trustworthiness. The problem stems from organizational and philosophical factors rather than being a technical issue. Neumann shows that we need to stop using short-term security fixes and instead actively discuss fundamental system vulnerabilities. Our organization will protect against future system failures through our commitment to humility and our practice of sharing all operational procedures with others.

Our systems will inevitably experience failure, according to the question. We need to understand whether we can develop systems that will collapse yet still enable us to survive.