View Single Post
  #10  
Old August 16th 04, 07:59 PM
Henry Spencer
external usenet poster
 
Posts: n/a
Default

In article ,
Peter Stickney wrote:
The first Ariane V flight was an interesting example of foolish
complacency on the part of the entire team, from managers down through
techs. The reason for the loss of guidance was that they re-used the
Ariane IV guidance system. Not a bad idea, but the greater
acceleration of the Ariane V caused it to overflow several data
accumulators...


Close, but not quite right. There were several interlocking mistakes:

1. The problematic routine in the inertial measurement unit software was
used to restart Ariane 4 countdowns after late holds. It had no role
whatsoever on Ariane 5, but it was left in.

2. Said routine also had no reason to be left running after launch, yet it
was (on both Ariane 4 and Ariane 5).

3. Most float-to-integer conversions in the software were protected
against overflow, but to reduce overhead, a few that "couldn't overflow"
weren't. One, in this particular routine, overflowed because of the
higher acceleration of Ariane 5, and this caused an exception trap. As
Les Hatton put it: "Approximately 37 seconds into the flight, the 16-bit
integer overflowed. Now in a sloppier language like C, the program would
have continued happily rumbling away to itself but would not in all
probability have interfered with the flight. However, the Ada language is
made of sterner stuff. Faced with this run-time error, the program threw
an exception..."

4. Any unexpected exception was considered a sign of a hardware failure.
Upper management basically thought they could prevent design errors by
ordering the engineers not to make any. So any problem was a random
hardware failure, in which case it seemed reasonable for that inertial
unit to stop dead and let the other one carry on. But since there was a
common design flaw, they *both* did that in fast succession.

This caused the guidance logic to drop into a debug mode
where it sat & waited for somebody to look at its core dump.


No, actually it was worse -- the inertial unit started spewing debug
output down the line to the main guidance system, which interpreted it as
guidance updates! This would qualify as mistake #5, except that it made
no difference in the end: with both inertial units in debug mode, the
rocket was doomed even if they'd just gone silent.

This showed an incredible lack of attention on two fronts - The buffer
overruns would have been immediately obvious if there had been even
the most minimal amount of realistic testing performed.


There *was* considerable testing... of the new stuff.

And *that* was mistake #5: full guidance simulations were dropped from
the test plans as unnecessary when budgets and schedules started to pinch.
The software for the central guidance computer was tested, but full-system
tests including the inertial units (or at least their software) were
thought less important -- after all, those units and that code had flown
many times on Ariane 4.

...Even with that, it would
have been possible for th eguidance system to control the rocket,
albeit not as accurately, if it had kept functioning.


As per above, there would have been no accuracy loss at all -- the problem
was in a routine which wasn't involved in actually guiding the rocket.
Had the inertial-unit software done its best to carry on despite problems,
all would have been well.

But the attitude that the design would be perfect, and so only random
hardware failures would cause trouble, was everywhere in that program.

...It brings up the question of how,
other than just by saying so, or claiming it was designed that way,
they expected to man-rate Airane V.


Man-rating is an essentially meaningless process anyway. (In practice,
man-rated launchers do not appear to have higher reliability than
non-man-rated ones.)
--
"Think outside the box -- the box isn't our friend." | Henry Spencer
-- George Herbert |