A Space & astronomy forum. SpaceBanter.com

Go Back   Home » SpaceBanter.com forum » Astronomy and Astrophysics » Amateur Astronomy
Site Map Home Authors List Search Today's Posts Mark Forums Read Web Partners

Rover Spirit's flash memory problem explained



 
 
Thread Tools Display Modes
  #1  
Old February 25th 04, 06:12 PM
Mike Simmons
external usenet poster
 
Posts: n/a
Default Rover Spirit's flash memory problem explained

Technical description of the problem with Spirit's flash memory system
from EETimes.

The trouble with Rover is revealed
By Ron Wilson, EE Times
February 20, 2004 (6:32 p.m. EST)
URL: http://www.eetimes.com/story/OEG20040220S0046

SAN MATEO, Calif. — When the Mars rover Spirit went dark on Jan.21 a Jet
Propulsion Laboratory team undertook to reprogram the craft's computer
only to find themselves introducing an unpredictable sequences of events.

The trouble with the Mars rover Spirit started much earlier in the mission
than the day the craft stopped communicating with ground controllers.

"It was recognized just after [the June 2003] launch that there were some
serious shortcomings in the code that had been put into the launch load of
software," said JPL data management engineer Roger Klemm. "The code was
reworked, and a complete new memory image was uploaded to the spacecraft
and installed on the rover shortly after launch."

That appeared to fix the problems that had been identified with the
initial load. But what no one at JPL could have anticipated was that the
new load also made possible a totally implausible sequence of events that
would, many months later, silence Spirit.

The Spirit rover has a radiation-hardened R6000 CPU from Lockheed-Martin
Federal Systems at the heart of the system. The processor accesses 120
Mbytes of RAM and 256 Mbytes of flash. Mounted in a 6U VME chassis, the
processor board also has access to custom cards that interface to systems
on the rover.

The operating system is Wind River Systems' Vx-Works version 5.3.1, used
with its flash file system extension. In operation, the real-time OS and
all other executable code are RAM-resident.

The flash memory stores executable images that are loaded into RAM at
system boot. Separately, about 230 Mbytes are used to implement a flash
file system that stores “data products,” or data files that are created by
the rover's subsystems and held for transmission to Earth.

Among the data products are the images created by the rover's cameras.

“Part of my responsibility in the data management team is to keep track of
the data files that are created, transmitted and deleted on the rover
during the mission,” Klemm explained. “We recognized early in the planning
process that the flash file system had a limited capacity for files. It is
not just a limitation in the flash itself but also in the directory
structure."

Klemm explained that as data is collected by Spirit, files are created and
stored in the flash file system until a communications window opens — an
opportunity to transmit the data either directly to Earth or to one of the
two orbiters circling the Red Planet. Then the files are transmitted. They
are still held in the flash system until retrieved and error-corrected on
Earth. If data is missing, requests are sent for retransmission. If the
data is intact, a command is sent to delete the received files.

"But there were also directories of files already placed into the file
system in the launch load," Klemm said. "When we uploaded a new image to
the rover, we recognized that those files would have to be deleted,
because they were being replaced by a new set using different
directories."

Accordingly, on Martian day 15 (or “sol 15”) of rover operation, a utility
was uploaded to the rover to find and delete the old directories.

Murphy strikes on Mars

But the transmission that uploaded the utility was a partial failu Only
one of the utility program's two parts was received successfully. The
second part was not received, and so in accordance with the communications
protocol it was scheduled for retransmission on sol 19.

Thus was the fuse lit on a software hand grenade.

The data management team's calculations had not made any provision for
leftover directories from a previous load still sitting in the flash file
system.

As Murphy would have it, earlier, sol 19 Spirit attempted to allocate more
files than the RAM-based directory structure could accommodate. That
caused an exception, which caused the task that had attempted the
allocation to be suspended. That in turn led to a reboot, which attempted
to mount the flash file system. But the utility software was unable to
allocate enough memory for the directory structure in RAM, causing it to
terminate, and so on.

Spirit fell silent, alone on the emptiness of Mars, trying and trying to
reboot. And its human handlers at JPL seemed at a loss to help, unable to
diagnose a system they could not see.

Luckily, early in the process of proposing failure scenarios, someone
remembered the earlier failure to upload the second piece of the utility.
The scenario was modeled, and it was discovered that a VxWorks flag that
causes a task to be suspended on a memory allocation failure was set in
the existing image.

"The irony of it was that the operating system was doing exactly what we'd
told it to do," Klemm lamented.Working on the theory that the rover was in
fact listening and rebooting, the team commanded Spirit to reboot without
mounting the flash file system.

The team then uploaded a script of low-level file manipulation commands
that worked directly on the flash memory without mounting the volume or
building the directory table in RAM. Using the low- level commands, about
a thousand files and their directories — the leftovers from the initial
launch load — were removed.

"At that point we mounted the flash file system and ran a checkdisk
utility," Klemm said. To everyone's enormous relief, the mount was
successful.

"As we had anticipated, there was some corruption from the event, so that
was corrected," Klemm added. "In the process of going through the contents
of the file system, we discovered a system log in which the problem was
documented, step by step, right up to the allocation request that failed."

Klemm said that with the leftover directories and their files removed, the
system is now functioning well. But just in case, the team is working on
an exception-handler routine that will more gracefully recover from an
allocation failure.

As a postscript, Klemm noted that the other day he heard a car commercial
on the radio that made reference to the Mars rover, comparing, for
example, the car's speed over the ground to Spirit's. In the process of
touting the car's extended-warranty program, the ad noted that the Mars
rover came with "interplanetary roadside assistance." "That phrase just
stuck in my mind," Klemm said. " love it."
  #2  
Old February 26th 04, 01:31 AM
Jon Isaacs
external usenet poster
 
Posts: n/a
Default Rover Spirit's flash memory problem explained

Technical description of the problem with Spirit's flash memory system
from EETimes.


Thanks for Posting that Mike. Pretty amazing, fixing problems at that level at
that distance.

jon
  #3  
Old February 26th 04, 08:06 AM
jerry warner
external usenet poster
 
Posts: n/a
Default Rover Spirit's flash memory problem explained

nice post -



Mike Simmons wrote:

Technical description of the problem with Spirit's flash memory system
from EETimes.

The trouble with Rover is revealed
By Ron Wilson, EE Times
February 20, 2004 (6:32 p.m. EST)
URL: http://www.eetimes.com/story/OEG20040220S0046

SAN MATEO, Calif. — When the Mars rover Spirit went dark on Jan.21 a Jet
Propulsion Laboratory team undertook to reprogram the craft's computer
only to find themselves introducing an unpredictable sequences of events.

The trouble with the Mars rover Spirit started much earlier in the mission
than the day the craft stopped communicating with ground controllers.

"It was recognized just after [the June 2003] launch that there were some
serious shortcomings in the code that had been put into the launch load of
software," said JPL data management engineer Roger Klemm. "The code was
reworked, and a complete new memory image was uploaded to the spacecraft
and installed on the rover shortly after launch."

That appeared to fix the problems that had been identified with the
initial load. But what no one at JPL could have anticipated was that the
new load also made possible a totally implausible sequence of events that
would, many months later, silence Spirit.

The Spirit rover has a radiation-hardened R6000 CPU from Lockheed-Martin
Federal Systems at the heart of the system. The processor accesses 120
Mbytes of RAM and 256 Mbytes of flash. Mounted in a 6U VME chassis, the
processor board also has access to custom cards that interface to systems
on the rover.

The operating system is Wind River Systems' Vx-Works version 5.3.1, used
with its flash file system extension. In operation, the real-time OS and
all other executable code are RAM-resident.

The flash memory stores executable images that are loaded into RAM at
system boot. Separately, about 230 Mbytes are used to implement a flash
file system that stores “data products,” or data files that are created by
the rover's subsystems and held for transmission to Earth.

Among the data products are the images created by the rover's cameras.

“Part of my responsibility in the data management team is to keep track of
the data files that are created, transmitted and deleted on the rover
during the mission,” Klemm explained. “We recognized early in the planning
process that the flash file system had a limited capacity for files. It is
not just a limitation in the flash itself but also in the directory
structure."

Klemm explained that as data is collected by Spirit, files are created and
stored in the flash file system until a communications window opens — an
opportunity to transmit the data either directly to Earth or to one of the
two orbiters circling the Red Planet. Then the files are transmitted. They
are still held in the flash system until retrieved and error-corrected on
Earth. If data is missing, requests are sent for retransmission. If the
data is intact, a command is sent to delete the received files.

"But there were also directories of files already placed into the file
system in the launch load," Klemm said. "When we uploaded a new image to
the rover, we recognized that those files would have to be deleted,
because they were being replaced by a new set using different
directories."

Accordingly, on Martian day 15 (or “sol 15”) of rover operation, a utility
was uploaded to the rover to find and delete the old directories.

Murphy strikes on Mars

But the transmission that uploaded the utility was a partial failu Only
one of the utility program's two parts was received successfully. The
second part was not received, and so in accordance with the communications
protocol it was scheduled for retransmission on sol 19.

Thus was the fuse lit on a software hand grenade.

The data management team's calculations had not made any provision for
leftover directories from a previous load still sitting in the flash file
system.

As Murphy would have it, earlier, sol 19 Spirit attempted to allocate more
files than the RAM-based directory structure could accommodate. That
caused an exception, which caused the task that had attempted the
allocation to be suspended. That in turn led to a reboot, which attempted
to mount the flash file system. But the utility software was unable to
allocate enough memory for the directory structure in RAM, causing it to
terminate, and so on.

Spirit fell silent, alone on the emptiness of Mars, trying and trying to
reboot. And its human handlers at JPL seemed at a loss to help, unable to
diagnose a system they could not see.

Luckily, early in the process of proposing failure scenarios, someone
remembered the earlier failure to upload the second piece of the utility.
The scenario was modeled, and it was discovered that a VxWorks flag that
causes a task to be suspended on a memory allocation failure was set in
the existing image.

"The irony of it was that the operating system was doing exactly what we'd
told it to do," Klemm lamented.Working on the theory that the rover was in
fact listening and rebooting, the team commanded Spirit to reboot without
mounting the flash file system.

The team then uploaded a script of low-level file manipulation commands
that worked directly on the flash memory without mounting the volume or
building the directory table in RAM. Using the low- level commands, about
a thousand files and their directories — the leftovers from the initial
launch load — were removed.

"At that point we mounted the flash file system and ran a checkdisk
utility," Klemm said. To everyone's enormous relief, the mount was
successful.

"As we had anticipated, there was some corruption from the event, so that
was corrected," Klemm added. "In the process of going through the contents
of the file system, we discovered a system log in which the problem was
documented, step by step, right up to the allocation request that failed."

Klemm said that with the leftover directories and their files removed, the
system is now functioning well. But just in case, the team is working on
an exception-handler routine that will more gracefully recover from an
allocation failure.

As a postscript, Klemm noted that the other day he heard a car commercial
on the radio that made reference to the Mars rover, comparing, for
example, the car's speed over the ground to Spirit's. In the process of
touting the car's extended-warranty program, the ad noted that the Mars
rover came with "interplanetary roadside assistance." "That phrase just
stuck in my mind," Klemm said. " love it."


 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Slip Sliding Away (Mars Rovers) Ron Astronomy Misc 16 March 14th 04 05:07 PM
Rover Spirit -- Flash Memory Problem Dave Science 0 February 2nd 04 03:51 PM
Spirit has a mind of its own? Jon Berndt Space Shuttle 33 January 28th 04 04:48 AM
Rover Airbag to Get Another Tug Ron Astronomy Misc 0 January 8th 04 02:24 AM


All times are GMT +1. The time now is 12:44 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright 2004-2024 SpaceBanter.com.
The comments are property of their posters.