Despite the widespread use of electronic devices and control systems, electrical engineering and computer science have rarely ever been at the center of discussion for ethical dilemmas. However, a handful of closely related case scenarios do exist. Here I will be discussing the software error of the MIM-104 Patriot Surface to Air Missile system that led to the death of 28 American Soldiers in 1991. In addition, I will present a few solutions that I believe would have prevented the incident, as well as address a few counterclaims to those solutions.
On February 25th, 1991, a Patriot missile defense system failed to properly detect a scud missile targeting American barracks in Dhahran, Saudi Arabia. The cause of error was the way in which time was represented in the missile’s computer system. A report submitted to the house of representatives expressed that, “Time is kept continuously by the system’s internal clock in tenths of seconds but is expressed as an integer or whole number (e.g., 32,33, 34…). The longer the system has been running, the larger the number representing time.” Essentially, when a computer stores a number in memory, the number is converted into a binary digit consisting of zeros and ones. The number 0.1, however, when converted to binary is a non-terminating number and will generate extraneous bits, introducing errors into the system that will accumulate over time.
In the same report it is stated that, “Alpha Battery, the [missile system] in question, … had been in operation for over 100 consecutive hours.” This amount of time allowed the system clock’s numerical error to build up large enough to cause the missile system to incorrectly track any incoming missile. To be precise, running the missile system for 100 hours shifted the system clock such that it accumulated an inaccuracy of 0.343 seconds instead of its normal 2-20 seconds (a negligible 0.0001% error) – this is enough time for a scud missile to travel more than half a kilometer. The most concerning cause of this issue was the insidious nature of the software error. It was not immediately apparent that this error was existent, because running an initial test would not have allowed much error to build up. This begs the question: how well should engineers understand the systems they design before calling it a finished product, and aside from the fact that the system works, what information be provided to the end user?
One solution to this issue is to have the engineers designing these systems fully understand exactly how their system works before finishing a product through extended, rigorous testing. It may have seemed that the Patriot’s control system would be able to operate nominally for extended periods of time after running superficial initial tests. However, had testing been performed over a reasonably long time (8+ hours at least), the timing error would have been apparent and the event avoided. This type of solution falls underneath a section of the IEEE code of ethics which asks for engineers to seek and accept criticism of technical work.
But one important fact to note is, “The Patriot system was originally designed to operate in Europe against Soviet medium- to high-altitude aircraft and cruise missiles traveling at speeds up to about MACH 2 (1500 mph). To avoid detection, it was designed to be mobile and operate for only a few hours at one location.” The Alpha Battery system was slightly modified to operate in the desert environment in Dhahran. However, it was based on technology that was only used for only a few hours (1-2 hours) at a time. Presenting the fact that the missile system was based on technology designed only to operate for a few hours at a time, and not continuously run for nearly a week would serve as a viable solution that would have circumvented the accumulation of error. This fact would have cautioned those based in Dhahran about the intended length of time for which the missile system was designed to operate under and most likely allowed for someone to occasionally reset the missile system after a certain length of time to erase the accumulated system clock error.
This said, a few excellent counterpoints come to mind for both solutions. First, testing the missile system for about a week is not something that would seem immediately necessary, and any engineer would have just induced that the control system would be able to work for any period of time if it worked initially. Second, an engineer communicating essentially that there are corner cases in which the system will not operate nominally is essentially another way of saying the system does not work, which would decrease the chances of that system passing through validation stages.
In many cases, it is true that conducting a test for approximately a week in length would not be the first method of verification that comes to mind for any engineer. A table in the report to the House of Representatives, however, indicates that a test for 8 hours would have accumulated 0.0275 seconds of error – which would allow a scud missile traveling at MACH 5 to travel around 40 meters. Had a test been run for that length, an error would have been detected.
As for communicating corner cases for which a system does not work, it would be well worth the effort to communicate these flaws, as it would enable others to find solutions to the system’s limitations. Masking small system flaws from the end user to give them a sense of security may seem like a good idea, but for situations where lives are immediately at risk, a much better idea would be to expose these types of vulnerabilities first, even if it may delay the validation stages of that electronic system.
In summary, the MIM-104 Patriot Surface to Air Missile system failure in 1991 could have been prevented had its flaws been communicated earlier, and had engineers ran time extensive tests to expose any accumulation of errors. It is undoubtedly true that project deadlines and milestones must be met in any development environment. But when a system intended for the purpose of defense is the center of one’s work, rigorous testing, bearing the responsibility to communicate flaws, and accepting the implications of doing so should all be considered equally important to truly validating its integrity.
Blair, Michael, Sally Obenski, and Paula Bridickas. “PATRIOT MISSILE DEFENSE Software Problem Led to System Failure at Dhahran, Saudi Arabia.” IMTEC-92-26 Patriot Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia (1992): n. pag. Web. 29 Feb. 2016.
Skeel, Robert. “Roundoff Error and the Patriot Missile.” Roundoff Error and the Patriot Missile. N.p., n.d. Web. 29 Feb. 2016.