![]() |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
![]()
Photo received from recovering Spirit rover
BY JUSTIN RAY SPACEFLIGHT NOW January 28, 2004 See: http://spaceflightnow.com/mars/mera/040128spirit.html for the photo Working as space-age surgeons 100 million miles away, ground controllers are trying to precisely pinpoint the software glitch that halted the Mars rover Spirit's mission to explore Gusev Crater last Wednesday. If successful, officials say the robot geologist could be out of recovery and back at work early next week. In a promising development late today, Mission Control released the first photograph taken by Spirit since the rover's computer problems began. It shows the rover's science arm reaching out to examine a pyramid-shaped rock nicknamed Adirondack. Spirit took this image on January 28 and relayed it to Earth, the first picture from the rover since problems began a week earlier. Credit: NASA/JPL As seen by the rover's front-facing hazard-avoidance camera, the arm remains where it was on the morning of Sol 18 when things began to go awry. The German Mossbauer Spectrometer instrument is seated over the rock in a search for iron-bearing minerals. After finishing the Mossbauer investigations, the arm was supposed to use the Rock Abrasion Tool to scratch away part of Adirondack's exterior to create a window inside. But that never occurred. Spirit's computer system, its flash memory bogged down by too many data files, began a continuous series of resets. Contact with Earth was lost for a time. Now, controllers have managed to get a better handle on their $400 million spacecraft to find the exact source of the problem and delete old files that aren't needed. "We are attempting to get a trace from the flight software of the problem and compare that to what we believe it to be, what we have seen in the testbed, make sure we are correct and then move forward in deleting some of the files from our flash file system as a result of understanding the problem," mission manager Jennifer Trosper said Wednesday. "We are extremely careful because we want to make sure that we don't make an error in deleting files. The we have done file deletes on the spacecraft before, so we've shown that does work. The file directories have all different names and you can convince yourself that you are actually deleting the right thing." Controllers are trying to run a computer script in the rover to track down the bug. But as of mid-day Wednesday, Trosper said things had not gone according to plan. "Over the past two days we have had some difficulty getting the script to run on the vehicle. So we are continuing to work that problem. "The method we are using right now in running this script -- it's kind of a back door into the flight software -- is a fairly surgical technique to identify the exact problem and deal with that little problem. "If we are not able to successfully complete our surgical technique, we have larger hammers, we like to say, that we can use in order to solve this problem." By strategically going after the bug, officials hope to preserve useful data still stored in the flash memory for later playback to Earth. "The intent of the last few days has been to maintain the state of the flash memory. We actually think that the flash is not corrupt. We would like keep the data that's in the flash memory. If we can't do that based on the technique we're trying to use then the next step we have is to actually delete the data that is in the flash memory. We've talked to the science team. Almost all of the data is replaceable." Science information waiting in the flash memory includes the Alpha Particle X-Ray Spectrometer and Mossbauer Spectrometer data collected during studies of the Adirondack and earlier collaborative observations between Spirit and the European Mars Express orbiter. The preview-like thumbnail images of the joint rover/orbiter research have already been received from Spirit, giving scientists some data to use if the rest can't be recovered. "Most of the science that was desired to be done can be done from the thumbnail images. The science team has agreed that is adequate for the focus of the experiment we had with Mars Express. Clearly, they would like to get the rest of it down. But in order to get all the data down it would take many sols and we have make a risk trade here and a time trade," Trospher said. "We will attempt the surgical technique about one more day. If that doesn't work, we will move forward to the less-surgical techniques. And hopefully if we are on the right track we would hope at the earliest be back doing science early next week. If we're not on the right track, it could take longer than that." A specialized group of engineers were brought together to revive Spirit last week and coax the rover back into action. The control team will be returning to its full size in the coming days, if all goes well. "The anomaly team right now is probably 15 to 20 people because it is a focused effort on solving this flight software problem. Last night, we went to adding probably another 10 people to move towards doing our nominal timeline. And in a few nights, we will go to the full overnight timeline of staffing with the science and engineering teams in preparation for getting Spirit back on its feet for the science mission." -- Regards, Terry King ...In The Woods In Vermont The one who Dies With The Most Parts LOSES!! What do you need? |
#2
|
|||
|
|||
![]()
On Thu, 29 Jan 2004 10:13:03 -0500, Terry King
wrote: Spirit's computer system, its flash memory bogged down by too many data files, began a continuous series of resets. Contact with Earth was lost for a time. IOW, they tried to store another file they thought would fit, and it didn't, and the master program didn't do anything more intelligent than reboot. Now, controllers have managed to get a better handle on their $400 million spacecraft to find the exact source of the problem and delete old files that aren't needed. "We are attempting to get a trace from the flight software of the problem and compare that to what we believe it to be, what we have seen in the testbed, make sure we are correct and then move forward in deleting some of the files from our flash file system as a result of understanding the problem," mission manager Jennifer Trosper said Wednesday. "We are extremely careful because we want to make sure that we don't make an error in deleting files. The we have done file deletes on the spacecraft before, so we've shown that does work. The file directories have all different names and you can convince yourself that you are actually deleting the right thing." I suppose they never heard of internal fragmentation, block size, and stuff. I don't know the details either, but flash memories apparently fragment badly, don't reclaim individual sectors as well as your average Windows disk directories. Eventually you have to reformat the flash like reformatting a disk to get contiguous room. It sounds like they didn't know that. If, indeed, the flash memory is even the root of the problem. "We will attempt the surgical technique about one more day. If that doesn't work, we will move forward to the less-surgical techniques. And hopefully if we are on the right track we would hope at the earliest be back doing science early next week. If we're not on the right track, it could take longer than that." I hope that this flash/files program is the whole problem, but if so, it still sounds like they never did any serious system test, not to mention missing some important design rules from the start. The "less surgical" technique would be "delete all files" as on any digital camera. Very sophisticated. -- All of the above is speculation based on nothing but the public news. J. |
#3
|
|||
|
|||
![]()
On Thu, 29 Jan 2004 10:13:03 -0500, Terry King
wrote: Spirit's computer system, its flash memory bogged down by too many data files, began a continuous series of resets. Contact with Earth was lost for a time. IOW, they tried to store another file they thought would fit, and it didn't, and the master program didn't do anything more intelligent than reboot. Now, controllers have managed to get a better handle on their $400 million spacecraft to find the exact source of the problem and delete old files that aren't needed. "We are attempting to get a trace from the flight software of the problem and compare that to what we believe it to be, what we have seen in the testbed, make sure we are correct and then move forward in deleting some of the files from our flash file system as a result of understanding the problem," mission manager Jennifer Trosper said Wednesday. "We are extremely careful because we want to make sure that we don't make an error in deleting files. The we have done file deletes on the spacecraft before, so we've shown that does work. The file directories have all different names and you can convince yourself that you are actually deleting the right thing." I suppose they never heard of internal fragmentation, block size, and stuff. I don't know the details either, but flash memories apparently fragment badly, don't reclaim individual sectors as well as your average Windows disk directories. Eventually you have to reformat the flash like reformatting a disk to get contiguous room. It sounds like they didn't know that. If, indeed, the flash memory is even the root of the problem. "We will attempt the surgical technique about one more day. If that doesn't work, we will move forward to the less-surgical techniques. And hopefully if we are on the right track we would hope at the earliest be back doing science early next week. If we're not on the right track, it could take longer than that." I hope that this flash/files program is the whole problem, but if so, it still sounds like they never did any serious system test, not to mention missing some important design rules from the start. The "less surgical" technique would be "delete all files" as on any digital camera. Very sophisticated. -- All of the above is speculation based on nothing but the public news. J. |
#4
|
|||
|
|||
![]()
On Thu, 29 Jan 2004 10:13:03 -0500, Terry King
wrote: Spirit's computer system, its flash memory bogged down by too many data files, began a continuous series of resets. Contact with Earth was lost for a time. IOW, they tried to store another file they thought would fit, and it didn't, and the master program didn't do anything more intelligent than reboot. Now, controllers have managed to get a better handle on their $400 million spacecraft to find the exact source of the problem and delete old files that aren't needed. "We are attempting to get a trace from the flight software of the problem and compare that to what we believe it to be, what we have seen in the testbed, make sure we are correct and then move forward in deleting some of the files from our flash file system as a result of understanding the problem," mission manager Jennifer Trosper said Wednesday. "We are extremely careful because we want to make sure that we don't make an error in deleting files. The we have done file deletes on the spacecraft before, so we've shown that does work. The file directories have all different names and you can convince yourself that you are actually deleting the right thing." I suppose they never heard of internal fragmentation, block size, and stuff. I don't know the details either, but flash memories apparently fragment badly, don't reclaim individual sectors as well as your average Windows disk directories. Eventually you have to reformat the flash like reformatting a disk to get contiguous room. It sounds like they didn't know that. If, indeed, the flash memory is even the root of the problem. "We will attempt the surgical technique about one more day. If that doesn't work, we will move forward to the less-surgical techniques. And hopefully if we are on the right track we would hope at the earliest be back doing science early next week. If we're not on the right track, it could take longer than that." I hope that this flash/files program is the whole problem, but if so, it still sounds like they never did any serious system test, not to mention missing some important design rules from the start. The "less surgical" technique would be "delete all files" as on any digital camera. Very sophisticated. -- All of the above is speculation based on nothing but the public news. J. |
#5
|
|||
|
|||
![]()
JXStern writes:
I hope that this flash/files program is the whole problem, but if so, it still sounds like they never did any serious system test, not to mention missing some important design rules from the start. The "less surgical" technique would be "delete all files" as on any digital camera. Very sophisticated. Given the success of the two rovers it was indeed a wet blanket to learn about the file problem - a very typical problem on earth. Its evident that testing at resource limit conditions wasn't done. One explanation for this is that the rover reboots each morning. Many earth based applications from Windows to Mainframes must be rebooted often in order to "hide" gradual buildup of problems from memory leaks or what not. In testing they may have started from scratch. operated for a day, and cleared it out when their shift ended. Now they will have to be extra careful with the rovers so they don't push them up to the resource limit they now know about. Of course there could be other resource limits yet to be discovered. If it takes a week or two to recover from each one that will be a significant drain. Clearly JPL needs to have a team of loud jack-booted non-academics software guys to take their software up to these frequently encountered resource limits and make sure the software can handle it. Constrain one resource and see what dies, then two resources and see what dies, etc. Much better to do this on earth! I really wish the interpretation of what this means was different, because if we are right then it looks like we will have a roller coaster ride of software crashes followed by two week recovery cycles. It will be hell. Later Mark Hittinger |
#6
|
|||
|
|||
![]()
JXStern writes:
I hope that this flash/files program is the whole problem, but if so, it still sounds like they never did any serious system test, not to mention missing some important design rules from the start. The "less surgical" technique would be "delete all files" as on any digital camera. Very sophisticated. Given the success of the two rovers it was indeed a wet blanket to learn about the file problem - a very typical problem on earth. Its evident that testing at resource limit conditions wasn't done. One explanation for this is that the rover reboots each morning. Many earth based applications from Windows to Mainframes must be rebooted often in order to "hide" gradual buildup of problems from memory leaks or what not. In testing they may have started from scratch. operated for a day, and cleared it out when their shift ended. Now they will have to be extra careful with the rovers so they don't push them up to the resource limit they now know about. Of course there could be other resource limits yet to be discovered. If it takes a week or two to recover from each one that will be a significant drain. Clearly JPL needs to have a team of loud jack-booted non-academics software guys to take their software up to these frequently encountered resource limits and make sure the software can handle it. Constrain one resource and see what dies, then two resources and see what dies, etc. Much better to do this on earth! I really wish the interpretation of what this means was different, because if we are right then it looks like we will have a roller coaster ride of software crashes followed by two week recovery cycles. It will be hell. Later Mark Hittinger |
#7
|
|||
|
|||
![]()
JXStern writes:
I hope that this flash/files program is the whole problem, but if so, it still sounds like they never did any serious system test, not to mention missing some important design rules from the start. The "less surgical" technique would be "delete all files" as on any digital camera. Very sophisticated. Given the success of the two rovers it was indeed a wet blanket to learn about the file problem - a very typical problem on earth. Its evident that testing at resource limit conditions wasn't done. One explanation for this is that the rover reboots each morning. Many earth based applications from Windows to Mainframes must be rebooted often in order to "hide" gradual buildup of problems from memory leaks or what not. In testing they may have started from scratch. operated for a day, and cleared it out when their shift ended. Now they will have to be extra careful with the rovers so they don't push them up to the resource limit they now know about. Of course there could be other resource limits yet to be discovered. If it takes a week or two to recover from each one that will be a significant drain. Clearly JPL needs to have a team of loud jack-booted non-academics software guys to take their software up to these frequently encountered resource limits and make sure the software can handle it. Constrain one resource and see what dies, then two resources and see what dies, etc. Much better to do this on earth! I really wish the interpretation of what this means was different, because if we are right then it looks like we will have a roller coaster ride of software crashes followed by two week recovery cycles. It will be hell. Later Mark Hittinger |
#8
|
|||
|
|||
![]() |
#9
|
|||
|
|||
![]() |
#10
|
|||
|
|||
![]() |
|
Thread Tools | |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mars Exploration Rover Update - July 2, 2004 | Ron | Astronomy Misc | 0 | July 2nd 04 11:02 PM |
Slip Sliding Away (Mars Rovers) | Ron | Astronomy Misc | 16 | March 14th 04 05:07 PM |
Spirit Condition Upgraded as Twin Rover Nears Mars | Ron | Astronomy Misc | 53 | January 27th 04 07:08 PM |
Spirit Rover Nearly Ready to Roll | Ron | Astronomy Misc | 5 | January 14th 04 05:03 PM |
UFO Activities from Biblical Times | Kazmer Ujvarosy | Astronomy Misc | 0 | December 25th 03 05:21 AM |