Articles

HorizonSystemQualityEngineeringUKPostOffice

What can we learn from the UK Post Office Horizon system failure

7 March 2024

Looking at the devastating effects of the Horizon system failure in the UK Post Office, we see an opportunity to address this situation from a Quality Engineering (QE) viewpoint, to consider what can be learnt from this when proceeding with software initiatives. This paper outlines our perspective on some of the key quality-related issues that occurred during the development and management of the Horizon system, as well as what Planit advise as key recommendations and actions to take to ensure high-quality software is delivered by your organisation’s software development vendors.

Summary of the UK Post Office Horizon system failure and the ensuing effects

When the new electronic point-of-sale system Horizon went live in 1999, commissioned by the government to automate Post Office services, it was the beginning of what would become a major scandal. Computer Weekly commenced an investigation into the system in 2009, revealing that sub-postmasters, who run Post Office branches, were being blamed for unexplained financial losses. Despite claiming these errors originated in the Horizon system, more than 700 sub-postmasters were then prosecuted by the Post Office for theft and false accounting resulting in court convictions, prison sentences and devastation for the sub-postmasters and their families. There is extensive coverage of this tragic turn of events in the media which has recently found new interest due to a TV documentary further highlighting this situation, and the human impact.

Without having insight into this case other than what has been reported in the media, the problems likely extend beyond the technology itself. The case highlights how faulty software can result in major business risk and how important QE is, to mitigate against those risks.

Some of the Horizon software project failures that could have been mitigated with QE

From a QE perspective, some of the media reported failures in the development and management of the Horizon software, included:

Assuming that the software was perfect.
No independent oversight of the software and no independent audit of quality practices.
No segregation of duties between which developers can edit testing and production environments.
No audit trail and/or enterprise-level visibility of the changes to production environments.
No independent testing.
The system was released before it was ready, despite developers reporting several defects.
As alleged by a developer working on the Horizon system, there was initially a significant lack of standard software engineering processes with “no design documents, no test documents, no peer reviews, no code reviews, no coding standards”.

How could QE have helped?

How could some of these project failures been avoided? Here are key examples of best practices from QE that could have been used to avoid those failures.

Assuming that the software was perfect. Software can be incredibly powerful and valuable, but it is created by humans, and humans can make mistakes. It is crucial to build in QE activities from the start that both find and prevent these mistakes from happening, as well as continuous monitoring of the software once it is live to ensure it is working as it should.

No independent oversight of the software and no independent audit of quality practices. Independent oversight of the software and independent audits of quality and development practices are critical for ensuring the quality and overall success of the software development project for numerous reasons, including:
- The unbiassed perspective of the independent party when evaluating the software development processes, without being influenced by internal politics or conflicts of interest.
- Quality assurance of the software development process ensures its adherence to established quality standards, industry best practices, and regulatory requirements.

No segregation of duties between which developers can edit testing and production environments. It is always best practice to keep the testing and production environments separate, and to segregate the duties between those that build the software, and those that have access to production environments, for many important reasons including:
- If you allow the same developer access to test and production environments, you run the risk of accidental changes or errors occurring in production.
- There is a risk of data breaches via access to production data can expose sensitive or secure information.

No audit trail and/or enterprise-level visibility of the changes to production environments. A visible audit trail is an essential component of safe change management in the production environment. It provides transparency, accountability, security, enables change tracing and enables effective management and troubleshooting of production issues. Without it you have no record of what changes were made or who and why they were made and facilitating a rollback to a previous, steady state becomes very challenging when production issues occur. In addition, each change to a production environment should always be required to move through a change advisory process, to ensure there is senior oversight of production environment changes routinely occurring within the organisation. Production environment changes should also always be communicated to the customers and end-users of each software product. Furthermore, real end-users should be actively engaged to acceptance test any change to their production system.
No independent testing. When large systems integrators (SIs) build systems for their customers, they often offer testers from their internal teams to test each system which sometimes is akin to a student marking their own homework. Software contacts with SIs are commonly billed based on a “fixed price” model, which means the price the customer agrees to pay for the software is the price the SI must stick to when they are building and testing the system. There are examples where SIs under-quote to win the work with their customer, resulting in having to cut corners to deliver the software for the agreed budget, and then once a system is live in production, request additional funds from their customer to continually fix defects. Independent testing from a third-party specialist testing consultancy ensures that your company’s best-interests, and your end-users’ best interests, are at the forefront when it comes to evaluating software quality. Independent testing consultancies ensure the whole system is tested against your business requirements, according to your business and end-user priorities, ensuring it works as intended and delivers a positive experience to end-users. It also ensures defects and software quality risks are visible, from very early in the project, ensuring an unbiassed view of software quality.
The system was released before it was ready despite developers reporting several defects. Ensuring that a system is ready for release is a key part of QE and the ability to deploy with confidence is supported by multiple best-practices. These include taking risk assessments into account, performing go-live readiness reviews and supporting end-users through verification of production systems to ensure software quality is high and that deployed systems perform their tasks accurately and correctly.
A lack of standard software engineering processes. There is a reason for the way quality is engineered into the SDLC using multiple best-practices and processes that together ensure risks are avoided and that the software performs as it should. The alleged lack of standard processes for design, peer review, coding standards, code review and testing would likely have caused many of the issues that were evident in Horizon from the start.

QE will help safeguard against software quality risks

Reducing business risk is a fundamental aspect of QE, as the objectives of performing testing and quality improvement activities reduces risk at an enterprise and governance level. A key part of QE is identifying, very early on, the ways in which software might fail, so that quality and testing practices can be strategically selected to mitigate each risk. This allows quality to be built into each software product from the start, reducing the likelihood of critical bugs in production and improving the success of each software development project. By building quality in, with the objectives of the business and stakeholders in mind, you enable proactive risk mitigation that ensures that the finished software product meets your business requirements and expectations of your customers and end-users. This allows your organisation to save money and time throughout the software development lifecycle (SDLC) as you do not risk building critical defects into each software system. It enables you to release higher quality software more successfully, more efficiently, and at a lower cost, providing a much more positive user experience.

On the other hand, when you don’t build quality in from the start, your software teams can get stuck doing costly rework, correcting faulty software and filling gaps in the software, with each bug being significantly more expensive to fix later in the lifecycle. Worse still, too many defects have, and routinely do, cause entire software development projects to fail, due to the extensive costs of rework and poor quality.

It is perilous to assume that any software product will perform perfectly.

Equally, testing and software quality improvement activities should not stop when you reach the go-live phase. Continuing to monitor software once it’s live in production will ensure you can both detect and prevent unexpected anomalies in production. This safeguards your systems so that they remain secure, performant and reliable, and that they continue to meet customer and end-user expectations.

Key recommendations to reduce risk in your software delivery project

Although the Fujitsu implementation of the Horizon system in the UK Postal Service has served as a painful reminder of the importance of QE, there are plenty of other examples where software failures and a lack of quality governance have led to issues such as brand damage, credibility, poor customer experience, legal issues, economic losses, Government turnover and even caused danger to human life.

Every day, Planit works with customers to ensure that their software products are robust, secure, and will meet the needs of their business, customers and end-users. We regularly take the role of independent quality partner when the software is delivered by a software development vendor or systems integrator, especially where the testing and quality function is not clearly defined within either organisation. Planit offers delivery optimisation all the way from ideation to release and beyond, where our engineers review quality across the breadth of the lifecycle. We take the role of customer advocate and trusted advisor to ensure quality objectives are met within the time and budget of the project. This can sometimes mean helping our customer ensure their software development vendor contracts are aligned with software quality expectations.

Checklist for ensuring software quality when working with a development vendor

When you work with a vendor that is delivering software in your organisation, it can be difficult to know where to begin to ensure the right level of quality. The following is our recommended checklist of items to start with to ensure success in your software implementation project.

Have you embedded software quality and testing expectations and requirements in your development vendor’s contract?

A clear division of quality and testing responsibilities between you and your software development vendor is essential. Otherwise, wires could become crossed, and you may encounter problems such as project delays, poor software quality, and excessive software development costs.

To prevent these issues and delays in the final stages, these responsibilities should be well-defined and communicated, such as, verifying that the service includes testing activities as needed for your project, including (but not limited to): system integration tests, nonfunctional testing, security testing, and end-to-end testing. You can also ask your vendor if they offer specific frameworks, accelerators or other assets in their testing process, as these can have a significant impact on time to market as well as cost and quality.

Is your development vendor giving you evidence of their testing, and their software defects? 

It can be risky to assume that testing is being conducted by your development vendor, to then find out that important activities were missed out to the detriment of the system’s quality. We have seen numerous instances where testing and quality were simply not considered by the development vendor, only for the customer to find the software failing during late-stage acceptance testing by the business. In other cases, key system integration activities were missed by the vendor, leading to costly defects and extensive rework to correct bugs in the interactions between systems. In many other cases, we have observed development vendors only ever conducting “happy path” positive testing, and not testing “unhappy paths” with negative inputs, to ensure systems are robust. Additionally, we commonly see non-functional testing for performance, security, usability, compatibility and reliability left until very late in the project, and rarely in the remit of the software development vendor. If your software development vendor is not providing you with evidence of their testing, then chances are the testing either isn’t taking place, or if it is, it’s only covering a limited number of happy path scenarios with no coverage of negative scenarios, negative data or other important aspects of quality that will matter to end-users.

Is there an independent party doing any testing or helping manage user acceptance testing? 

We often observe frequent attempts to cut the testing phase short in order to meet deadline pressures for go-live, putting the quality of the software at risk. An independent party helps to ensure the solution fulfils business requirements by making sure a suitable amount of testing is completed – not too much or too little, and always in your priority order. This ensures that testing is focused on the delivery of intended business outcomes.

Has the vendor made product quality risks visible to you? 

As mentioned earlier, risk mitigation is a critical part of QE. It involves identifying software failure risks early on and deciding which ones are acceptable and which ones require more quality and testing work. When this information is not provided it could mean that communication is poor, but it could also mean that these risks have not been identified, or that they have been identified but your vendor is choosing not to share this information with you. None of these options are positive for your project. Ask your provider what risks they believe could be associated with the project and how they will address it, as well as what their risk mitigation plan and escalation process are in case of issues.

Consider these additional questions as you progress through your project:

Are any residual risks (that could go live in production) being discussed with your senior stakeholders, such as steering committees? 
Have you reviewed the full list of open defects, to decide if you want to accept them into production? 
Are there any open defects that your vendor is encouraging you to accept into production at go-live? 
Has your vendor provided evidence that defects are fixed, and are you then retesting them with positive and negative testing to ensure they’re fixed?

At Planit, we are committed to helping you deliver the highest standards of testing and quality for your projects. Whether you need advice, guidance, or support, don’t hesitate to contact us and let us know how we can help you achieve your goals.

AUTHOR:

Get Updates

Get the latest articles, reports, and job alerts.

What can we learn from the UK Post Office Horizon system failure

Summary of the UK Post Office Horizon system failure and the ensuing effects

Some of the Horizon software project failures that could have been mitigated with QE

How could QE have helped?

QE will help safeguard against software quality risks

Key recommendations to reduce risk in your software delivery project

Checklist for ensuring software quality when working with a development vendor

Consider these additional questions as you progress through your project:

AUTHOR:

Susanne Matson

AUTHOR:

Tafline Ramos

Get Updates