|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
On Fri 2007-08-17T16:38:27 +0200, LC's NoSpam Newsreading account hath writ:
However it is always possible to add a COMMENT which claims conformance with the latest (e.g. 3.0) standard. For the sake of the problem that triggered us to overcome the starting friction and actually write the FITS MIME document I would like to go much farther than a comment. I would like to see the IAUFWG establish a registry (in the sense of the IANA) wherein all the documented FITS conventions have unique names, and I would like to see a series of keywords which can be placed in the PHDU to assert that "this FITS file employs these named conventions". When we get that far we may have solved Bill Joye's problem with the ds9 viewer, which is to answer the questions: I've just been given a FITS file. What might I do with it? How might I best present its content to the user? FITS MIME could only go as far as representing to the internet community that we have a file format and a robust process for taking care of it. The rest of this work is only beginning. -- Steve Allen WGS-84 (GPS) UCO/Lick Observatory Natural Sciences II, Room 165 Lat +36.99855 University of California Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m |
#12
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
Bill said:
The "once FITS always FITS" philosophy captures the spirit of FITS, but in practice each new version of the FITS Standard has imposed new requirements that in principle could invalidate existing FITS files. For example, version 2.0 of the FITS Standard introduced a new requirement that the value and comment fields in a keyword MUST be separated by a slash character. It would be interesting to review past such instances. I don't personally recall changes of this mandatory nature. The example regarding comments is pretty tame since any reasonable implementation would already be ignoring the comments. Do you have another example to quote? Of course, if the FITS community thinks a new requirement would cause too much dislocation to existing data or software, then an alternative would be to just "strongly recommend" instead of "require" the new feature. Indeed. I think that would be best in all three instances quoted below. It's also possible to specify that a new requirement will come into effect at some point in the future to allow time for software systems to adapt, as was done with the Y2000 change to the DATE keyword format. Obviously the scheduling for the Y2K changes was forced, but I agree with Steve that any such new requirements should be announced well in advance of taking effect. I think, however, that there is a misapprehension about the DATE/DATE- OBS changes. The new ISO date format was very carefully designed to only be required for post-Y2K data (there was also some overlap period as I recall). The old format remained - and remains - valid to describe 20th century data. In fact, the old dd/mm/yy format was clarified to explicitly denote such dates. No after-the-fact requirements were leveraged onto archival data. This is very different than attempting to place new absolute requirements that would invalidate old data sets. I say "attempting to place", because there is no mechanism for enforcement. There are only 3 proposed new absolute requirements in this list: 1. Keywords that have a value shall not be repeated in a header. I have many examples (hundreds of thousands?) of files in which keywords are repeated. Rather than the wording in the current proposal, I would replace the attempt at a requirement with a strong recommendation and a clarification that the final copy of any such repeated keyword should take precedence. 2. PCOUNT and GCOUNT must immediately follow the last NAXISn keyword in all conforming extensions (as is already required in IMAGE, TABLE, and BINTABLE extensions). I guess I'd like to know if there are any such extensions. If not, this is relatively safe. If so, make it a strong recommendation for an explicit list of grandfathered extension types and an absolute requirement for any newly defined extensions. 3. Embedded space characters are now forbidden within numeric values in an ASCII Table (e.g. "1 23 4.5" is no longer allowed to represent the decimal value 1234.5) Again - are there any examples of such usage in the field? I think the general principle, however, should reflect the "letter of the law", not "spirit of the law". I should end here by repeating my earlier appreciation of the excellent effort that has gone into the revision. If this careful revision has not uncovered any other critical new requirements that must be applied ex post facto, one can opine that there are no lurking dragons that need to be fought. That being the case, it seems to me that the responsibility lies rather to preserve the great investment in archival data products rather than to attempt to legislate these new requirements on the back of our current holdings and current software investment. And should new dragons appear that the community deems must be slain, it does indeed appear to this observer that an explicit version keyword (whether a comment or not) should be simultaneously required to trigger new conformance restrictions. The loose wording about pre- existing data is unenforceable since there is no requirement (whether or not there ought to be) for a DATE keyword to separate old from new. Perhaps the new version tag could itself supply a date - in that case, I'd recommend that any revisions of the standard should contain explicit references to the date(s) that apply for different feature(s). Rob |
#13
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
Steve Allen wrote:
On Fri 2007-08-17T13:18:40 -0400, William Pence hath writ: 1. Keywords that have a value shall not be repeated in a header. If this is to be implemented with exactly that wording then on behalf of UCO/Lick/Keck I have to ask for a very clear answer to this question: Starting when? We can do it, but in order to move the organization to get there we're going to need a little warning beforehand. Whether or not this exact wording is approved, you should probably consider yourself warned that it might happen. :-) The earliest that the regional FITS committees and the IAUFWG could approve a new version of the FITS Standard would be early 2008. If there were major disagreements, I would guess that it could take up to an additional year to resolve the issues. Bill Pence -- __________________________________________________ __________________ Dr. William Pence NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice) Greenbelt MD 20771 +1-301-286-1684 (fax) |
#14
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
On Aug 17, 2007, at 11:43 AM, Rob Seaman wrote:
There are only 3 proposed new absolute requirements in this list: 1. Keywords that have a value shall not be repeated in a header. I have many examples (hundreds of thousands?) of files in which keywords are repeated. Rather than the wording in the current proposal, I would replace the attempt at a requirement with a strong recommendation and a clarification that the final copy of any such repeated keyword should take precedence. I strongly support this. The proposed text makes such files invalid FITS, retrospectively. Taking the last instance of a keyword is a much more reasonable interpretation. But I note that a program that blindly drops all but the last instance may lose the information conveyed by: the number of instances, the list of values and their order, and any associated comments. Are there existing applications where a keyword can occur more than once with different values, in which more than just the last occurrence are intended to carry significant information? [I understand that technically COMMENT and HISTORY keywords do not "have a value".] - Tim Pearson |
#15
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
1. Keywords that have a value shall not be repeated in a header.
I have many examples (hundreds of thousands?) of files in which keywords are repeated. Rather than the wording in the current proposal, I would replace the attempt at a requirement with a strong recommendation and a clarification that the final copy of any such repeated keyword should take precedence. 2. PCOUNT and GCOUNT must immediately follow the last NAXISn keyword in all conforming extensions (as is already required in IMAGE, TABLE, and BINTABLE extensions). I guess I'd like to know if there are any such extensions. If not, this is relatively safe. If so, make it a strong recommendation for an explicit list of grandfathered extension types and an absolute requirement for any newly defined extensions. It got me thinking, so I looked at the FITS parser in iSTB (the current version of save-the-bits deployed on three mountaintops and handling several terabytes of raw data annually). And no, I don't currently require PCOUNT and GCOUNT to immediately follow NAXISn. I do, however, throw an error if these particular keywords are duplicated :-) Speaking of which, it is the duplicate keyword requirement that seems most onerous. To implement this efficiently for all keywords, one would have to build a hash table or some such for each header. Then one is left with the question of what to do upon detecting a duplicate. The sense of a requirement is to simply throw an error and exit. How helpful is that? STB will toss a FITS file if any of the structural keywords (NAXISn, BITPIX, PCOUNT, GCOUNT, etc.) are questionable - precisely because this calls into question the possibility of handling the data appropriately. The daemon needs to know the size of the file because it is reading it on the standard input, perhaps concatenated with other files. The size of each extension must be known to find subsequent extensions. Etc. But am I to discard brand new data simply because some camera temperature keyword appears twice? I spend a lot of time every week trying to convince a dozen different instrument teams to provide the archive with a reliable DATE-OBS, EXPTIME, FILTER, OBSTYPE, etcetera and so forth. They'll rebel if I start tossing their data due to foibles with minor engineering keywords. I really think enforcing #1 will prove impossible in practice. I'm not going to build a hash table to search for duplicates for every keyword just so I can throw an error that will anger my stakeholders over trivial details. And on the other hand, for pipeline reduced science data sets, no requirement is needed since there already is sufficient impetus for data providers to carefully tailor their data products, eliminating duplicate keywords as a matter of course. Making it a strong recommendation is my own strong recommendation. Rob |
#16
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
On Fri 2007-08-17T12:13:57 -0700, Tim Pearson hath writ:
Are there existing applications where a keyword can occur more than once with different values, in which more than just the last occurrence are intended to carry significant information? In the instruments delivered to Keck by UCO/Lick and Caltech there are repeated occurrences of keywords which not only have values, but for which the data type of the value is different in the different occurrences. The information is significant, but only in the sense that it is a dump of information which might be relevant to someone who is debugging engineering aspects of the image data acquisition system (and that's pretty much the explanation for why this atrocity of poor form in database normalization exists in the first place). To the extent that we share a common code base I have already implemented the changes necessary to avoid such FITS files, but it will have to be retrofitted and tested on each affected instrument. -- Steve Allen WGS-84 (GPS) UCO/Lick Observatory Natural Sciences II, Room 165 Lat +36.99855 University of California Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m |
#17
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
Rob Seaman wrote:
Bill said: The "once FITS always FITS" philosophy captures the spirit of FITS, but in practice each new version of the FITS Standard has imposed new requirements that in principle could invalidate existing FITS files. For example, version 2.0 of the FITS Standard introduced a new requirement that the value and comment fields in a keyword MUST be separated by a slash character. It would be interesting to review past such instances. I don't personally recall changes of this mandatory nature. The example regarding comments is pretty tame since any reasonable implementation would already be ignoring the comments. Do you have another example to quote? Some other new requirements we - keyword values are restricted to be a single value, not an array - logical keyword values must consist of a single T or F followed only by a space or a slash character - integer and float keyword values must not contain embedded spaces - complex keyword values must be enclosed in parentheses - no other keywords may intervene between the mandatory keywords in the primary array or extension - the TFORM keyword values must be upper case (e.g., F5.2, not f5.2) There are only 3 proposed new absolute requirements in this list: 1. Keywords that have a value shall not be repeated in a header. I have many examples (hundreds of thousands?) of files in which keywords are repeated. Rather than the wording in the current proposal, I would replace the attempt at a requirement with a strong recommendation and a clarification that the final copy of any such repeated keyword should take precedence. Imposing a new requirement on software systems to read the last instance of the keyword would likely have a lot of negative repercussions. Current software systems produce different results when reading a FITS file with duplicate keywords. CFITSIO cyclically scans the header for the next occurrence of the keyword following the last keyword that was read or written, so the same application may read a different value depending on exactly what processing was done before hand. I'm sure other commonly used software systems will always return the first instance of the keyword, while other systems will always return the last instance. Requiring all software systems to follow the same behavior is not practical, so the only sure way to prevent users from getting an incorrect result when analyzing the file is to eliminate duplicate keywords in the first place. There is less harm if the duplicated keywords all have the same value, so maybe the wording of this requirement should be modified to take this into account. 2. PCOUNT and GCOUNT must immediately follow the last NAXISn keyword in all conforming extensions (as is already required in IMAGE, TABLE, and BINTABLE extensions). I guess I'd like to know if there are any such extensions. There a at least some of your FOREIGN extensions have the order of these 2 keywords reversed. 3. Embedded space characters are now forbidden within numeric values in an ASCII Table (e.g. "1 23 4.5" is no longer allowed to represent the decimal value 1234.5) Again - are there any examples of such usage in the field? No, as far as we know. If there are any, then it is very likely that most current software systems do not support embedded spaces in the value and will silently read an incorrect value, or will exit with an error. Thus, it seems better to me to outlaw this usage rather than just not recommend it or deprecate it. (...) And should new dragons appear that the community deems must be slain, it does indeed appear to this observer that an explicit version keyword (whether a comment or not) should be simultaneously required to trigger new conformance restrictions. I don't really see any practical benefit to having a version keyword. Either the software will support a new requirement, or it won't; the presence of a version (or DATE) keyword isn't really helpful, except maybe to a human reading the header. The loose wording about pre- existing data is unenforceable since there is no requirement (whether or not there ought to be) for a DATE keyword to separate old from new. Perhaps the new version tag could itself supply a date - in that case, I'd recommend that any revisions of the standard should contain explicit references to the date(s) that apply for different feature(s). The proposed new statement ("Existing FITS files that conformed to the latest version of the standard at the time the files were created are expressly exempt from any new requirements imposed by subsequent versions of the standard.") is, I think, mainly intended as a political statement to reassure institutions that the FITS committees are not imposing new unfunded mandates that require modifications to existing FITS archives. I don't see this statement as having much relevance to the way software is implemented. Bill Pence -- __________________________________________________ __________________ Dr. William Pence NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice) Greenbelt MD 20771 +1-301-286-1684 (fax) |
#18
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
Rob Seaman wrote:
Speaking of which, it is the duplicate keyword requirement that seems most onerous. To implement this efficiently for all keywords, one would have to build a hash table or some such for each header. Then one is left with the question of what to do upon detecting a duplicate. The sense of a requirement is to simply throw an error and exit. How helpful is that? This is missing the main point of this new requirement. No current software system that I am aware of (except for the FITS verifier code) checks for duplicated keywords, so users have no idea which of the duplicated keywords is being used by a particular program. The software might be using the first, the 'next', or the last instance of the keyword. This could easily cause the user to derive incorrect scientific results. What is the best way to prevent this from happening? Seems to me we should focus on the root of the problem and (formally at least) disallow duplicated keywords in a conforming FITS file. This doesn't mean software should automatically throw out a file that inadvertently has a duplicated keyword. Stepping back a little, I think the seriousness of this problem depends on what keyword is duplicated. If it is just some observatory-specific keyword that does not directly affect the scientific results, then it does not matter very much, and data providers need not worry about it. But if a critical WCS keyword, or exposure time keyword is duplicated in the file with different values, then surely the data providers need to take responsibility and fix the problem. Bill Pence -- __________________________________________________ __________________ Dr. William Pence NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice) Greenbelt MD 20771 +1-301-286-1684 (fax) |
#19
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
I think, however, that there is a misapprehension about the DATE/DATE-
OBS changes. The new ISO date format was very carefully designed to only be required for post-Y2K data (there was also some overlap period as I recall). The old format remained - and remains - valid to describe 20th century data. In fact, the old dd/mm/yy format was clarified to explicitly denote such dates. No after-the-fact requirements were leveraged onto archival data. Plus a that time some change HAD to be made, since the old format was going to wrap around, while here we have a choice. 1. Keywords that have a value shall not be repeated in a header. I have many examples (hundreds of thousands?) of files in which keywords are repeated. Rather than the wording in the current proposal, I would replace the attempt at a requirement with a strong recommendation and a clarification that the final copy of any such repeated keyword should take precedence. I similarly cannot see the value of this particular proposed change: FITS readers will need to support repeated keywords forever, given the very large numbers of existing files with them, so it's not even as if this would simplify reading FITS. I am also very much in favour of instead simply clarifying that the last occurence has precedence. The other changes look more like matching the letter of the law with its spirit, so are perfectly fine with me. |
#20
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
William Pence writes: ... There are only 3 proposed new absolute requirements in this list: 1. Keywords that have a value shall not be repeated in a header. 2. PCOUNT and GCOUNT must immediately follow the last NAXISn keyword in all conforming extensions (as is already required in IMAGE, TABLE, and BINTABLE extensions). 3. Embedded space characters are now forbidden within numeric values in an ASCII Table (e.g. "1 23 4.5" is no longer allowed to represent the decimal value 1234.5) The public comment period on these, as well as all the other recommended changes, remains open here on this email list/newsgroup until at least the end of September... Another proposed change, the case of the EXTEND keyword being made optional, will also impose a software-change burden. Software which previously relied on that keyword will now be required to check for the presence of extensions in a different way. Craig |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[fitsbits] Proposed Changes to the FITS Standard | Mark Calabretta | FITS | 0 | August 2nd 07 09:39 AM |
[fitsbits] Proposed Changes to the FITS Standard | Steve Allen | FITS | 0 | August 1st 07 06:08 PM |
[fitsbits] Proposed Changes to the FITS Standard | Thierry Forveille | FITS | 0 | August 1st 07 04:51 PM |
[fitsbits] Proposed Changes to the FITS Standard | William Pence | FITS | 0 | July 27th 07 07:38 PM |
[fitsbits] Proposed Changes to the FITS Standard | Rob Seaman | FITS | 0 | July 24th 07 07:21 PM |