[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::digital

Title:The Digital way of working
Moderator:QUARK::LIONELON
Created:Fri Feb 14 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5321
Total number of notes:139771

5111.0. "From the loss of a nail, a battle was lost" by LJSRV1::ENGBROCK () Wed Jan 29 1997 15:41

This was sent to me without the source but it is interesting reading
    
    If they had only used an on-board Alpha system then.....
                                                            
SUBJECT: MINOR SOFTWARE BUG

 It took the European Space Agency 10 years and $7 billion to produce
 Ariane 5, a giant rocket capable of hurling a pair of three-ton satellites
into orbit with each launch and intended to give Europe overwhelming
supremacy in the commercial space business.

 All it took to explode that rocket less than a minute into its maiden
voyage last June, scattering fiery rubble across the mangrove swamps of
French Guiana, was a small computer program trying to stuff a 64-bit number
into a 16-bit space.

One bug, one crash. Of all the careless lines of code recorded in the annals
of computer science, this one may stand as the most devastatingly
 efficient. From interviews with rocketry experts and an analysis prepared
for the space agency, a clear path from an arithmetic error to total
destruction emerges.

 To play the tape backward:
 At 39 seconds after launch, as the rocket reached an altitude of two and a
half miles, a self-destruct mechanism finished off Ariane 5, along with its
payload of four expensive and uninsured scientific satellites.

 Self-destruction was triggered automatically because aerodynamic forces
were ripping the boosters from the rocket. This disintegration had begun
instantaneously when the spacecraft swerved off course under the pressure of
the three powerful nozzles in its boosters and main engine. The rocket was
making an abrupt course correction that was not needed, compensating for a
wrong turn that had not taken place.

 Steering was controlled by the on-board computer, which mistakenly thought
 the rocket needed a course change because of numbers coming from the
inertial guidance system. That device uses gyroscopes and accelerometers to
track motion. The numbers looked like flight data -- bizarre and
impossible flight data -- but were actually a diagnostic error message.

 The guidance system had in fact shut down. This shutdown occurred 36.7
 seconds after launch, when the guidance system's own computer tried to
convert one piece of data -- the sideways velocity of the rocket -- from a
64-bit format to a 16-bit format. The number was too big, and an overflow
 error resulted.

 When the guidance system shut down, it passed control to an identical,
 redundant unit, which was there to provide backup in case of just such a
failure. But the second unit had failed in the identical manner a few
milliseconds before. It was running the same software.

 This bug belongs to a species that has existed since the first computer
programmers realized they could store numbers as sequences of bits, atoms of
data, ones and zeroes: 1001010001101001. . . . A bug like this might crash a
spreadsheet or word processor on a bad day.

 Ordinarily, though, when a program converts data from one form to another,
 the conversions are protected by extra lines of code that watch for errors
 and recover gracefully. Indeed, many of the data conversions in the
guidance system's programming included such protection.

 But in this case, the programmers had decided that this particular velocity
figure would never be large enough to cause trouble. After all,  it never
had been before. Unluckily, Ariane 5 was a faster rocket than
 Ariane 4. One extra absurdity: the calculation containing the bug, which
shut down the guidance system, which confused the on-board computer, which
forced the rocket off course, actually served no purpose once the rocket
 was in the air. Its only function was to align the system before launch.

 So it should have been turned off. But engineers chose long ago, in an
earlier version of the Ariane, to leave this function running for the first
40 seconds of flight -- a "special feature" meant to make it easy to
 restart the system in the event of a brief hold in the countdown.

 The Europeans hope to launch a new Ariane 5 next spring, this time with a
newly designated "software architect" who will oversee a process of more
intensive and, they hope, realistic ground simulation.
Simulation is the great hope of software debuggers everywhere, though it
 can never anticipate every feature of real life. "Very tiny details can
have terrible consequences," says Jacques Durand, head of the project, in
Paris. "That's not surprising, especially in a complex software system  such
as this is."

 These days, we have complex software systems everywhere. We have them in
our dishwashers and in our wristwatches, though they're not quite so
mission-critical. We have computers in our cars -- from 15 to 50
microprocessors, depending how you count: in the engine, the transmission,
 the suspensions, the steering, the brakes and every other major subsystem.
 Each runs its own software, thoroughly tested, simulated and debugged, no
doubt.

 Bill Powers, vice president for research at Ford, says that cars' computing
power is increasingly devoted not just to actual control but to diagnostics
and contingency planning -- "Should I abort the mission, and if I abort,
where would I go?" he says. "We also have what's called a  limp-home
strategy." That is, in the worst case, the car is supposed to behave more or
less normally, like a car of the pre-computer era, instead of, say, taking
it upon itself to swerve into the nearest tree.

 The European investigators chose not to single out any particular
contractor or department for blame. "A decision was taken," they wrote.  "It
was not analyzed or fully understood." And "the possible implications of
allowing it to continue to function during flight were not realized."  They
did not attempt to calculate how much time or money was saved by omitting
the standard error-protection code.

 "The board wishes to point out," they added, with the magnificent blandness
of many official accident reports, "that software is an expression of a
highly detailed design and does not fail in the same sense
as a mechanical system." No. It fails in a different sense.  Software built
up over years from millions of lines of code, branching and unfolding and
intertwining, comes to behave more like an organism than a
machine.

 "There is no life today without software," says Frank Lanza, an executive
 vice president of the American rocket maker Lockheed Martin.  "The world
 would probably just collapse." Fortunately, he points out, really important
software has a reliability of 99.9999999 percent. At least, until it
doesn't.

T.RTitleUserPersonal
Name
DateLines
5111.1wait til Jan. 1 2000...CSC32::S_WASKEWICZWed Jan 29 1997 17:082
    
     The year 2000 will REALLY bring em out of the woodwork...
5111.2BHAJEE::JAERVINENOra, the Old Rural AmateurWed Jan 29 1997 17:1311
    Take all your money out of the bank well before 2000, and wait a few
    months before dpositing it again...
    
    I haven't really followe the Ariane story - what I initially read here
    (in Munich) said it _was_ a software error. Whether it was like
    described in .0 I don't know though I have the impression whoever wrote
    it used his/her journalistic freedom.
    
    Ariane 5 is different from Ariane 4 and apparently they saved some
    money in not recertifying all the software.
    
5111.3DECWET::LYONBob Lyon, DECmessageQ EngineeringWed Jan 29 1997 17:5718
re: .2

>   I haven't really followe the Ariane story - what I initially read here
>   (in Munich) said it _was_ a software error. Whether it was like
>   described in .0 I don't know though I have the impression whoever wrote
>   it used his/her journalistic freedom.

The full report can be read at:

    http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html

>   Ariane 5 is different from Ariane 4 and apparently they saved some
>   money in not recertifying all the software.

One wonders if they saved more than the several hundred million that was lost
when Ariane 501 went boom ...

Bob
5111.4DECWET::FARLEEInsufficient Virtual um...er....Wed Jan 29 1997 18:0023
You can get the full, original Ariane5 report from ESA at:

http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html

There are many lessons for us in this story, and many of them fall
into two corners:

Arrogance can be dangerous:
	The software philosophy was that systemic software failures were
	not a possibility.  If a component failed, it must be a physical 
	failure, thus the only response is to shut it down and switch to
	the alternate.  This is useless in the case of a software bug which
	will faithfully fail on both processors.

Cutting corners in testing can also be dangerous:
	Ariane5 re-used Ariane4 software.  It worked on Airane4, right?
	Unfortunately, it was deemed to be "too expensive, too much trouble
	to actually do an end-to-end simulation with anticipated Ariane5
	flight data.  So the software was NEVER tested against an Ariane5
	flight profile...  Until after the accident...  At that time, the
	bug was faithfully demonstrated, with exactly the same results as in
	the real flight.  This testing would probably have been alot cheaper
	than the rocket and satellite that they lost...
5111.5... and did you hear John Wayne died?vaxcpu.zko.dec.com::michaudJeff Michaud - ObjectBrokerWed Jan 29 1997 21:163
	From the date on the story pointed to in the URL this looks like
	old news (the rocket exploded June 4, 1996, and the inquiry board
	report is dated July 19, 1996).
5111.6Still a good storyPERFOM::HENNINGThu Jan 30 1997 04:216
    Even if 6-month-old-news, it's still a very good story to rub 
    into the noses of all who could use a dose of humility.
    
    Sign me
    
    	/software_writer
5111.7good reference...DOODL1::FISCHERThu Jan 30 1997 08:2420
    There is a more thorough treatment of this bug and the issues involved
    with the software sharing between Ariane 4 and 5 in Aviation Week
    Magazine (from last summer).  As usual, a problem like this is more
    complicated than this article implies. The issue of the alignment
    software running after liftoff was a real design contraint, not a
    simple "feature: it allowed for flexibility in the countdown process
    (the sequencing of launch events is extremely complicated, especially
    around handling and restarting holds).
    
    From what I recall, analysis showed the real problem was introduced when,
    in analyzing the data sets being sent to this routine for conversion,
    engineers knowingly used Ariane 4 launch profile data instead of
    Ariane 5 (similar, though slightly different) and then underestimated
    the risk of the real flight values being different from these
    "estimated" values.  The incorrect risk assessment then drove the
    decision not to add an error handler for this conversion.
    
    The picture in the Aviation Week article underscores the seriousness
    of such a "minor" oversight.
    
5111.8That launch attempt was classified an experimental launch...NETCAD::BATTERSBYThu Jan 30 1997 12:3814
    There is one thing that has failed being mentioned here (unless
    it was buried in the full report), but not part of the report
    posted in the base note.
    That Ariane 5 flight was the first flight in the Ariane 5 series.
    As such it was deemed an experimental flight. 
    The satellite payloads got a free ride on this launch because
    it was deemed an experimental flight instead of a full operational 
    production flight. To the payload owners, knowing the launch success 
    of the Ariane 4 series, they presumed a low risk (relatively speaking), 
    in a launch failure. Little did they know or would have expected that 
    a software bug, and the lack of pre-flight simulation would manifest 
    itself in such a totally destructive failure mode.
    
    Bob
5111.9see my personal nameDECWET::ONOSoftware doesn't break-it comes brokenThu Jan 30 1997 12:470
5111.10since this topic is a soapbox to begin with :-)vaxcpu.zko.dec.com::michaudJeff Michaud - ObjectBrokerThu Jan 30 1997 13:5425
> Even if 6-month-old-news, it's still a very good story to rub 
> into the noses of all who could use a dose of humility.

	and what does this story have to do with Digital? :-)

	The author of the base note implied that if they used an Alpha
	that this wouldn't of happened.  But that's a stretch.  While
	Alpha is considered a 64-bit processor, the problem of trying
	to stuff a 64-bit quantity into a 16-bit quantity will also
	fail using an Alpha cpu.  The problem was not that the processor
	used couldn't deal with 64-bit numbers (it was dealing with them
	to begin with), it's that the software assumed (and correctly
	it sounds like for what it was originally designed for, the model
	4 rocket) it could, using C language lingo, cast the 64-bit number
	to a 16-bit number without losing any resolution.  For example, on
	Digital UNIX a long is 64-bits and a short is 16-bits, and they
	tried to do this:

		long l = 123456;
		short s;

		s = l;

	even on an all-powerful Alpha, this will not work (well it
	will work, but the value of s won't be 123456).
5111.11NETCAD::MORRISONBob M. LKG2-A/R5 226-7570Thu Jan 30 1997 17:329
>    The satellite payloads got a free ride on this launch because
>    it was deemed an experimental flight instead of a full operational 
>    production flight.

  Interesting. I wonder if the payloads were uninsured because the under-
writers thought an "experimental" flight was too risky?
  I am surprised that the software didn't have range checks for ALL numbers.
Is there an issue that doing so would make the software too large to fit in
memory? Or too slow?
5111.12Many "but it works on ..." bugs are really source code problems...DECC::SULLIVANJeff SullivanThu Jan 30 1997 18:239
In the compiler and Digital UNIX groups, we get a lot of code that happens to
work on other vendors UNIXes. Many times, the problem can be traced back to
stuffing a 64-bit (pointer) value into a 32-bit variable. On other machines,
pointers are generally not 64-bit, so this is not a problem there. The code is
broken, but just happens to work.

If we only had $7 billion for each one of those we've seen...

-Jeff
5111.13DECWET::LYONBob Lyon, DECmessageQ EngineeringThu Jan 30 1997 18:2312
>  Interesting. I wonder if the payloads were uninsured because the under-
>writers thought an "experimental" flight was too risky?

Yes, but that's not all that uncommon even for "routine" flights.  The
underwriting costs are phenominal.

>  I am surprised that the software didn't have range checks for ALL numbers.
>Is there an issue that doing so would make the software too large to fit in
>memory? Or too slow?

It would have used up more CPU than the design parameters allowed (85% maximum
utilization).
5111.14Quality assurance needed seriously33102::JAUNGDave Bowers @WHOFri Jan 31 1997 09:1519
    ref .0
    
    
    About 30 or more years ago, US launched a satellite into the orbit. The
    The mission failed after the satellite revolved one round.  Engineers
    went back to check every parts and the simulation codes. They found
    that the compiler did not pick up the following FORTRAN statement:
    
    	DO 100  I=1.3
    
    	...
    
    The simiulation run flawlessly but the mission only lasted one round.
    If the code was corrected to:
    
    	DO 100 I=1,3
    
    The design defects would be found latter when I=3 thus the mission
    would not be failed. 
5111.15QUARK::LIONELFree advice is worth every centFri Jan 31 1997 12:516
Re: .14

A popular story, and one also attributed to the failure of a Venus mission,
but it never happened.

				Steve
5111.16DANGER::ARRIGHILife is an else-if constructFri Jan 31 1997 14:007
    re .15
    
    And since we're talking FORTRAN, I'm sure you know the facts. :)
    
    That language is like an old lover that you never quite get over.
    
    Tony (certainly I=1.3 would have produced a compile error)
5111.17AXEL::FOLEYhttp://axel.zko.dec.comFri Jan 31 1997 14:177
RE: .16

	Depends on which compiler.. You can write bad FORTRAN in
	any language. :)

	
							mike
5111.18DECWET::ONOSoftware doesn't break-it comes brokenFri Jan 31 1997 14:4910
re: .16

Shouldn't produce an error.  Fortran ignores whitespace on a line 
(or at least it used to), so you end up with

	DO100I = 1.3

It looks strange to see DO100I in the variable list.

Wes
5111.19COOKIE::FROEHLINLet's RAID the Internet!Fri Jan 31 1997 14:5232
.16>Tony (certainly I=1.3 would have produced a compile error)
    
    Nope! DO loop counter can either be real or integer. 
    
    A story here:
    
    The technical inspection authority in Germany (T�V) is responsible to
    check all aspects of security requirements in nuclear power plants. 
    It starts with verifying the calculations for the construction of the
    security core element. T�V had the program written and executed 
    by 3 different companies, in 3 different countries, in 3 different 
    programming languages. One day, the company in Germany reported
    differences in their data. Was on a VAX-11/780. Turned out, the FP
    processor started random number generating instead of correct FP
    calculations. Would not have been detected without available comparison
    data. I was involved to verify that this VAX can do 1+1 correctly.
    
    Or another one (or Next Unseen to skip):
    
    A huge steam turbine in a power plant was started the first time. The
    FORTRAN program running in the PDP-11 controlling the beast had the
    task to bring this beast thru devilish resonanzies on its way up to
    test speed. Lots of sensors signalled one FORTRAN program housing and
    shaft vibrations. Whenever vibrations increased, the program opened up
    the steam intake valve to set over this resonanzy quickly. Due to
    a sign error in one equation the program kept the turbine steady at
    thes first hot spot. Only an emergency steam release enabled by a human
    being avoided a catastrophy. I was allowed to visit the cracks in the
    building before it was rebuild.
    
    Enough the storries...back to testing
    Guenther
5111.20BHAJEE::JAERVINENOra, the Old Rural AmateurFri Jan 31 1997 17:1333
    >Nope! DO loop counter can either be real or integer. 
    
    G�nther, Du solltest es besser wissen!
    
    It's not a question of an integer or real DO loop counter - as someone
    else said, spaces have no meaning on FORTRAN, and a variable beginning
    with D normally wouldn't be an integer, so
    
        	DO 100  I=1.3
    
    is perfectly legal FORTRAN (and doesn't initiate a loop). I haven't
    written any FORTRAN since a looong time - I don't know whether a newer
    compiler might warn saying "this is basically correct, but probably not
    what you wanted to do". I haven' seen a C/C++ compiler either that
    warns about something like
    
    if (xxx = 0)
    {
    	//blow up the nuclear plant
    }
    
    Whether the stories about this bug are urban legend I don't know...
    I've heard them since my FORTRAN II times, and it's fairly long ago.
    
    BTW, those of you who have tried to write parsers for these languages
    (BASIC is another one where in the classic version whitespace was
    meaningless) it's a pain in the a** - apparently someone thought 20-40
    years ago it would be easier to parse, but it ain't.
    
    And, if FORTRAN were invented in Germany, it would probably be called
    FORM�B or something... ;-)
    
    
5111.21Old space programming errorsDECCXX::AMARTINAlan H. MartinFri Jan 31 1997 18:2253
Re .20:

>I haven' seen a C/C++ compiler either that warns about something like
>    
>    if (xxx = 0)
>    {
>    	//blow up the nuclear plant
>    }

$ TYPE CCW.C
int status;
extern int GetRadarInfo(void);
extern void LaunchMissiles(void);

int main(void)
{
    while (1) {
        status = GetRadarInfo();
        if (status = 1)
            LaunchMissiles();
    }

    return 0;
}
$ CC/DECC CCW/WARN:ENABLE=CHECK
                if (status = 1)
        ........^
%CC-W-CONTROLASSIGN, In this statement, the assignment expression "status=1" is
used as the controlling expression of an if, while or for statement.
                At line number 9 in CCW.C.
...
$ CC/DECC CCW/WARN:ENABLE=CHECK/VERSION
DEC C V5.3-006 on OpenVMS VAX V6.2
$


>    Whether the stories about this bug are urban legend I don't know...
>    I've heard them since my FORTRAN II times, and it's fairly long ago.

The missing superscript bar which caused ground-commanded self-destruct of the
Venus probe "Mariner 1" during early ascent is discussed in
http://catless.ncl.ac.uk/Risks/5.66.html#subj1 .


Re .15

>A popular story, and one also attributed to the failure of a Venus mission,
>but it never happened.

The "DO 10 I=1.10" which caused orbit prediction errors during Project Mercury
is discussed by the NASA employee who found it in
http://www.op.net/docs/Computer-Folklore/mariner_bug .
				/AHM
5111.22COOKIE::FROEHLINLet's RAID the Internet!Fri Jan 31 1997 18:575
    Right Ora...this "DO 100 I=1.3" is an assignment and not a DO loop 
    statement. The old FORTRAN trap.
    
    Thanks
    Guenther
5111.23QUARK::LIONELFree advice is worth every centFri Jan 31 1997 19:064
In Fortran 90 free-form source, the compiler would reject the malformed 
statement.

				Steve
5111.24random ramblingvaxcpu.zko.dec.com::michaudJeff Michaud - ObjectBrokerFri Jan 31 1997 19:2325
> I haven' seen a C/C++ compiler either that warns about something like
>     if (xxx = 0)
>     {
>     	//blow up the nuclear plant
>     }

	Also FWIW, I knew a group of programmers whose coding standard
	for equality comparision tests was to transpose the arguments.
	Ie.

		if( 0 == xxx ) ....

	this way if they accidently used = instead of ==, like:

		if( 0 = xxx ) ....

	they'd get a compiler error (never mind a warning).

	I personally however don't follow the coding standard as I find
	it visually unappealling and harder to read when reading code
	in English.  Ex. "if xxx is equal to 0" I don't even need to
	think about, but if the arguments to the comparision operator
	are reversed it reads "if 0 is equal to xxx", which to me at
	least is not natural sounding.  It's like saying "32 years old is
	john" instead of "john is 32 years old".
5111.25BHAJEE::JAERVINENOra, the Old Rural AmateurSat Feb 01 1997 07:2915
    re .21: Ok, ok, Alan, I admit I haven't used DEC C for ages.. FWIW,
    even V4.2 of Visual C++ doesn't complain:
    
    void foo(int i)
    {
    	if(i=1)
    	{
    		i=0;
    	}
    }
    
    Compiling...
    foo.cpp
    foo.obj - 0 error(s), 0 warning(s)
    
5111.26RUSURE::EDPAlways mount a scratch monkey.Mon Feb 03 1997 09:0517
    Re .11:
    
    >   I am surprised that the software didn't have range checks for ALL
    > numbers.
    
    There were range checks.  The overflowing numbers were dutifully caught
    and reported.  That error message output appeared in place of the
    expected output.  It was interpreted as data, which caused the rocket
    to veer off course.  Since the rocket was off course, safety mechanisms
    destroyed it.
    
    
    				-- edp
    
    
Public key fingerprint:  8e ad 63 61 ba 0c 26 86  32 0a 7d 28 db e7 6f 75
To find PGP, read note 2688.4 in Humane::IBMPC_Shareware.
5111.27They knew the theory but didn't have the equipment.ULYSSE::sbudhcp23.sbu.vbe.dec.com::MikeMon Feb 03 1997 10:0213
>>    BTW, those of you who have tried to write parsers for these languages
>>    (BASIC is another one where in the classic version whitespace was
>>    meaningless) it's a pain in the a** - apparently someone thought 20-40
>>    years ago it would be easier to parse, but it ain't.

The reasons for eliminating white space had little to do with the simplicity 
of parsing, real or imagined. The problem was one of cost - in this case the 
cost of storage, particularly temporary storage between passes on a 
multi-pass compiler. Deleting all the white space could give a 20% decrease 
in the space required.

Mike.

5111.28BHAJEE::JAERVINENOra, the Old Rural AmateurMon Feb 03 1997 10:3317
    re .27: 
    
�The problem was one of cost - in this case the 
�cost of storage, particularly temporary storage between passes on a 
�multi-pass compiler. Deleting all the white space could give a 20% decrease 
�in the space required.
    
    I can buy that argument for interpreters (like BASIC usually) - when
    the C64 was popular it was common to see long BASIC programs without a
    single space in them... pretty difficult to read.
    
    However, I don't see why a normal compiler would save any whitespace in
    it's intermediate code - whitespace is useful for tokenising the input,
    but then you throw it away anyway. Also, even though spaces have (more
    or less) no meaning in FORTRAN, it was customary to use them anyway, to
    make the code more readable.
    
5111.29Token delimiters? Sheer luxury, lad!ULYSSE::sbudhcp23.sbu.vbe.dec.com::MikeMon Feb 03 1997 10:5818
Re: .-1

One system I used allowed input from cards, paper tape or magnetic 
tape and stripped the white space before writing to drum as an 
intermediate file. The drum was about 100Kb. If the drum space 
overflowed the compiler crashed.

It was possible to use it without drum by feeding the paper tape in 
multiple times or load it to mag tape first but this would usually 
exceed your alloted time on the machine. A halfhour per day.

I don't think that this was untypical during the late 60s, early 
70s.

Mike.


5111.30Gotta use Warning Level 4DECCXX::AMARTINAlan H. MartinMon Feb 03 1997 11:498
Re .25:

--------------------Configuration: foo - Win32 Debug--------------------
Compiling...
foo.cxx
foo.cxx(3) : warning C4706: assignment within conditional expression
foo.obj - 0 error(s), 1 warning(s)
				/AHM
5111.31BHAJEE::JAERVINENOra, the Old Rural AmateurMon Feb 03 1997 13:552
    re .29: You had drums? Sheer luxury!  :-)
    
5111.32BIGUN::nessus.cao.dec.com::MayneWake up, time to dieTue Feb 04 1997 01:3419
In the good old days when BASIC was interpreted and possibly in ROM, and 
compilers were for mainframes, BASIC was often tokenised on input to the 
interpreter. Thus, the line

	10 FOR I = 1 TO 10

woubd be stored as

	[binary line number 10] [1 byte token FOR] I = 1 [1 byte token TO] 10

thus not only saving bytes, but running faster, since the interpreter's parsing 
job was a lot simpler. The LIST command would expand the tokens, so you'd never 
know it was happening. (Unless, like me, you wrote BASIC programs that read in 
the tokenised versions and rewrote them.)

(Wow, I had *two* 5�" floppy disk drives in those days, because I was the one 
writing the database software. 8-)

PJDM
5111.33JAVA brings back the good old daysSTAR::jacobi.zko.dec.com::jacobiPaul A. Jacobi - OpenVMS Systems GroupTue Feb 04 1997 14:077
>>> In the good old days when BASIC was interpreted

The "good old days" are back with a new name -- JAVA!


							-Paul

5111.34BHAJEE::JAERVINENOra, the Old Rural AmateurTue Feb 04 1997 18:358
    re .32:
    
    That was the case e.g. on the first Novas by Data general... but it
    wasn't true for the C-64. (I've had more to do with BASIC than I
    like...).
    
    re .33: There's a huge difference bewteeen BASIC and Java...
    
5111.35SKYLAB::FISHERGravity: Not just a good idea. It's the law!Mon Feb 10 1997 16:1334
re .26:
>    Re .11:
>    
>    >   I am surprised that the software didn't have range checks for ALL
>    > numbers.
>    
>    There were range checks.  The overflowing numbers were dutifully caught
>    and reported.  That error message output appeared in place of the
>    expected output.  It was interpreted as data, which caused the rocket
>    to veer off course.  Since the rocket was off course, safety mechanisms
>    destroyed it.
    
Not quite, according to my understanding of the Aviation Week articles cited
earlier.  It was explicitly decided for many cases not to trap overflows and
underflows because of the analysis which showed that the only way such things
could happen was if there were a hardware failure.  If there is a hardware
failure, there is a redundant processor to take over.  Therefore, the right
thing to do if you get an overflow trap is to HALT the current processor to
ensure that the redundant processor takes over.  The HALT put some data out on
the lines that was essentially an error code showing why the halt had occurred.

That all would have been fine except the problem was not a random h/w failure,
but a systematic s/w error.  Microseconds after the first processor halted, the
second one did two, leaving the error code as the only data available to the
engine computer.

Another thing from the AvWeek article to add:

Not only was the code in question not used after liftoff; it was not used AT
ALL in the Ariane V.  They decided not to remove it because they wanted to
change as few things as possible.  (How many times have we all made decisions
like that?)

Burns
5111.36RUSURE::EDPAlways mount a scratch monkey.Wed Feb 12 1997 08:4814
    Re .35:
    
    > Not quite, according to my understanding of the Aviation Week
    > articles cited earlier.
    
    Your version is more accurate.  The inquiry board report is at
    http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html.
    
    
    				-- edp
    
    
Public key fingerprint:  8e ad 63 61 ba 0c 26 86  32 0a 7d 28 db e7 6f 75
To find PGP, read note 2688.4 in Humane::IBMPC_Shareware.