|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
Perhaps there is a consensus building to back off from an absolute
ban on duplicate keywords. Here's my reply to Bill's latest in any event - kind of a "Tao of FITS". Apologies for the length: - keyword values are restricted to be a single value, not an array - logical keyword values must consist of a single T or F followed only by a space or a slash character - integer and float keyword values must not contain embedded spaces - complex keyword values must be enclosed in parentheses - no other keywords may intervene between the mandatory keywords in the primary array or extension - the TFORM keyword values must be upper case (e.g., F5.2, not f5.2) These are all (except perhaps the last) rare occurrences. In that case, a newly placed requirement is more like a clarification of the standard than a change. Duplicate keywords, on the other hand, are a frequent occurrence (thus the interest in eliminating them :-) "Once FITS, always FITS" may never have been in serious question before. Imposing a new requirement on software systems to read the last instance of the keyword would likely have a lot of negative repercussions. No more so than imposing a new requirement to detect and act on duplicate keywords. The difference is that outlawing duplicates doesn't fix those systems reading indeterminate values. We might ask what an ideal FITS library or application should do on encountering various exceptions. Whether or not duplicate keywords are outlawed, deprecated or ignored, feeding such input to our software will remain a frequent event. We can't legislate moral behavior, rather only the consequences for detected immorality. It seems to me that in this imperfect world it would be better if the major FITS software packages adopted a coherent behavior on encountering duplicate keywords. A header with duplicate FITS keywords is not a bug. Currently, it is perfectly legal FITS, if questionable practice. This cannot now be rescinded (except with some form of HDU-level versioning, I still assert). But even if duplicate keywords were illegal FITS, the question remains of what FITS software should do upon encountering them - and how our code should recognize the fact in the first place. Requiring all software systems to follow the same behavior is not practical, so the only sure way to prevent users from getting an incorrect result when analyzing the file is to eliminate duplicate keywords in the first place. You cannot avoid the question of what software is required to do by outlawing data. Those data can and will continue to be presented as input to our software. Perhaps there is some notion that we'll require all archives and data providers to scrub their data. This is at least as impractical a requirement as you describe for the software - more to the point, who will data providers turn to for software to perform such scrubbing? One way or the other, if we tackle this issue our software will have to detect duplicate keyword instances and take some action as a result. There is less harm if the duplicated keywords all have the same value, so maybe the wording of this requirement should be modified to take this into account. This strikes me as the sort of contingent action that indicates the primary action is ill conceived. As far as the software, it is simply another requirement placed on top of the first. Look for duplicate keyword names, then look for duplicate values - would the next step be a test for duplicate comments? some of your FOREIGN extensions have the order of these 2 keywords reversed. We'll look into the behavior you describe. I would expect most extension types, including FOREIGN, to be conformable to this more strict keyword ordering whether it is required or merely preferred. In addition to clarifying the ordering of PCOUNT/GCOUNT, this may be a good time to state this more clearly for all the mandatory keywords (section 4.4). In particular, the ordering of NAXISn is never explicitly restricted to increasing numerical order. The only statement for any of the mandatory keywords is presented in table 4.5, which suggests NAXISn be ordered, but never outright says it. 3. Embedded space characters are now forbidden within numeric values in an ASCII Table (e.g. "1 23 4.5" is no longer allowed to represent the decimal value 1234.5) Again - are there any examples of such usage in the field? No, as far as we know. If there are any, then it is very likely that most current software systems do not support embedded spaces in the value and will silently read an incorrect value, or will exit with an error. Thus, it seems better to me to outlaw this usage rather than just not recommend it or deprecate it. Again, the question is whether it is more productive to attempt to outlaw something or to describe what steps software should take upon encountering the usage. If there are no known instances, "outlawing" is equivalent to clarifying the standard. This is likely such a case. If there are many instances, I don't think we can escape from taking a position on what the software should do. I don't really see any practical benefit to having a version keyword. Either the software will support a new requirement, or it won't; the presence of a version (or DATE) keyword isn't really helpful, except maybe to a human reading the header. I don't understand. The software would interpret the version to know if the new requirement should be enforced for a particular HDU. In the absence of such versioning (by token or date), the software has to follow some sloppy heuristic to let the nuances of the data guide its behavior. The other two new requirements on the table strike me as clarifications and can go forward without versioning, perhaps with some tweaking of the language. I'm not sure about the EXTEND keyword. I'm not a big fan of introducing versioning myself, but the clear implication of avoiding versioning is that duplicate keywords cannot be gracefully banned after the fact. In fact, consider a situation in which the choice had been made to ban them back in the FITS Dreamtime - exactly the same stringent software requirements would pertain to detect instances and take application dependent action. Our libraries and applications would be more complex now as a result. (Arguably better, but certainly more complex.) Banning duplicates doesn't avoid significant new software requirements, it mandates them. The proposed new statement ("Existing FITS files that conformed to the latest version of the standard at the time the files were created are expressly exempt from any new requirements imposed by subsequent versions of the standard.") is, I think, mainly intended as a political statement to reassure institutions that the FITS committees are not imposing new unfunded mandates that require modifications to existing FITS archives. I don't see this statement as having much relevance to the way software is implemented. You can't avoid the unfunded mandate this way. Any software seeking to follow the letter of the standard would still have to detect instances of duplicate keywords and take some action. What statements like this do is to encourage folks to treat the standard as some floppy set of guidelines and conformance to the standard as an optional nicety for polite society. A file either conforms to the FITS standard or it does not. A ban on duplicate keywords is unenforceable unless it is paired with versioning. The statement above would fail to impress a lawyer since it isn't paired with a way for either humans or computers to determine which files were grandfathered in. Further, there is a sense of legal entrapment in promulgating such a new requirement with no realistic way to encourage instrument teams and others to redesign their systems to avoid duplicates. For instance, the ICE/ccdacq software permits observers to enter their own file of keywords, perhaps including duplicates. Users can trivially use IRAF hedit to add duplicates, etc. Perhaps there is no way to duplicate a keyword with CFITSIO? Who would enforce the ban? In any event, the FITS standard should be kept free of political statements. This is missing the main point of this new requirement. No current software system that I am aware of (except for the FITS verifier code) checks for duplicated keywords, so users have no idea which of the duplicated keywords is being used by a particular program. The software might be using the first, the 'next', or the last instance of the keyword. Well, as I said, iSTB throws an error if duplicate structural keywords are encountered. After 10 million files, I don't think I've ever seen this particular error in BITPIX, NAXISn, PCOUNT, GCOUNT or EXTENSION. We did just happen to see duplicate SIMPLE keywords while commissioning a new instrument. The problem was detected, reported and fixed. On the other hand, there are numerous ongoing examples of duplicated user keywords. It seems to me that applications should only be sensitive to header abnormalities that affect their own functionality. Instituting an absolute ban is meaningless unless all our software systems become aware of all possible duplicates. We can't just dump the responsibility on the users to avoid creating them in the first place unless our own software that they are using to create or update the HDUs aids in that task. This ban is attempting to avoid placing natural requirements on software by placing unnatural ones on the data. Not only is it unenforceable - the software requirements just pop up again elsewhere. This could easily cause the user to derive incorrect scientific results. What is the best way to prevent this from happening? This is the heart of the matter. As Dick says, there is no single simple solution. We should encourage data providers (and users) to avoid duplicate keywords. We should understand why such keywords may be created in the first place. Our major software packages should reach agreement on a common strategy should duplicates be encountered - whether this is that the behavior remain indeterminate, or the first instance or the last instance take precedent. Applications should detect duplicates which affect their functionality as with any other header peculiarities. Libraries should provide routines and utility programs for validating HDUs against a wide variety of exceptions, including duplicate keywords. A duplicated keyword is just one of a long list of poor header construction techniques that can't be fixed simply by demanding they not occur. Seems to me we should focus on the root of the problem and (formally at least) disallow duplicated keywords in a conforming FITS file. This doesn't mean software should automatically throw out a file that inadvertently has a duplicated keyword. "Formal" is the essence of a standard. I guess the notion is that deprecation hasn't proven strong enough so perhaps an absolute ban might do the trick? In the absence of practical consequences, what this really does is call the integrity of the standard into question. I think the seriousness of this problem depends on what keyword is duplicated. If it is just some observatory-specific keyword that does not directly affect the scientific results, then it does not matter very much, and data providers need not worry about it. But if a critical WCS keyword, or exposure time keyword is duplicated in the file with different values, then surely the data providers need to take responsibility and fix the problem. Whether the issue is duplicate keywords or some other keyword misformatting, there is more pressure on the data providers already to fix significant occurrences than this technical change to the FITS standard would apply. On the other hand, for the much more frequent case of unintentionally duplicating some non-critical keyword, this change would be outlawing files for no benefit and a lot of annoyance. In either case, the software faces exactly the same requirements. Rob |
#2
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
On Sun, 19 Aug 2007, Rob Seaman wrote:
[...] change. Duplicate keywords, on the other hand, are a frequent occurrence (thus the interest in eliminating them :-) [...] It seems to me that in this imperfect world it would be better if the major FITS software packages adopted a coherent behavior on encountering duplicate keywords. A header with duplicate FITS keywords is not a bug. Currently, it is perfectly legal FITS, if questionable practice. Maybe the point is that the nature (usage ?) of most keyword (type)s is indeterminate (or unpredictable by whoever wrote the file ?) or at least oscillates between those two extreme cases : - a keyword is intended as a named resource to be mainly read by software, maybe into a variable, and then be acted upon (all the mandatory and WCS keywords, those defined by specific conventions, etc.) - a keyword just records some information associated to a file, which is intended to be read by a human, but it is hardly relevant to any software (essentially "commentary" keywords). If all commentary information would be written into commentary or "value-less" keywords (4.4.2.4, 4.1.2.2, 4.1.2.3) a generic reader will have no problems. Talking about readers we can think of essentially two types of readers: - specific readers, which read only the keywords they know of beforehand. They read them by name. They know beforehand they should correspond to a variable of a given type (integer, real, string...). They most likely search for a keyword of a given name (and probably stop at the first occurrence). But if they know of or expect a duplicated keyword may knowingly act in some predefined way (does anybody know such a beast ?) - all-purpose readers. I can imagine things like reading the entire header into memory, or generating some data structure scanning the entire header. I have for instance an IDL procedure which reads a file into a structure with elements a.kwd1, a.kwd2 ... a.kwdn and a.data (the data array). Actually my procedure does not read FITS file, but a format of my own (which can however be generated also from FITS) ... and its relies on the (sound) idea that keywords have unique names, because structure element names are built on the fly from kwd names (so a.bitpix, a.naxis, a.bunit ...). In such a procedure duplicate kwds are a nuisance and trigger an error. In fact since my procedure skips COMMENTs (do not enters them in the structure at all), so it treats two particular keywords (HISTORY and another non-FITS one) as repeatable (in which cases it generates structure elements a.h0001, a.h0002 etc.) and fails in error in all other cases. All these seemed to me reasonable sound practice, and this inspired the idea to forbid duplication of (named, non-commentary, non-valueless) keyword in FITS 3.0. Given now that it seems there are more live FITS files which by purpose or accident (not error) contain duplicated keywords, we could probably demote the change from forbidding to strongly recommending against. But it is hard to define a preferred way to deal with duplicated kwds. Unless we register alternate conventions which explicitly specify what the reader should do about duplicated keywords. E.g. DUPKWDS = 'none' assures that the FITS file was written without any duplicated keywords DUPKWDS = 'ignore' (or 'comments') declares that duplicated keywords are of commentary nature, so they can be ignored by s/w or dealt with as HISTORY or COMMENTS DUPKWDS = 'take_first' declare that only the first or last value DUPKWDS = 'take_last' shall be considered DUPKWDS = 'concatenate' declare (string) values wanting to be concatenated (also numeric arrays ??) Any other cases possible ? But even with such conventions, we are still left with the problem of what a generic reader should do with (older or not) files not following any convention. Lucio Chiappetti -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. |
#3
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
On Tue 2007-08-21T14:05:26 +0200, LC's NoSpam Newsreading account hath writ:
But even with such conventions, we are still left with the problem of what a generic reader should do with (older or not) files not following any convention. That is true for more than just keywords, and is inherent to FITS. There is no mechanism for a FITS file to communicate how it is expected to be used or what any of the meanings are for "additional" keywords, i.e., keywords neither "mandatory" nor "reserved", nor for how to use any appended images and tables. Getting back to the issue of dupes, our spectrographs produce duplicate keywords, where the different values are of different data types. Without delving into the current code and doing archaeology to find the source for old versions which are still in use I can't even write out a recipe for interpreting whether those duplicated keywords are indicating that the system was in a normal or abnormal state when the image was acquired. I do not expect that anyone else should care how they would properly be interpreted, but I am sure that an explicit directive in the FITS standard would be misleading. I don't think this particular problem can be solved other than by explicitly indicating that the standard does not assign any meaning. But because I think there are a lot of other similar "problems" I'm not sure that the FITS standard benefits by making it explicit. This is the sort of thing that does belong in a user guide. -- Steve Allen WGS-84 (GPS) UCO/Lick Observatory Natural Sciences II, Room 165 Lat +36.99855 University of California Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m |
#4
|
|||
|
|||
[fitsbits] Proposed Changes to the FITS Standard
Lucio Chiappetti wrote:
- a keyword is intended as a named resource to be mainly read by software, maybe into a variable, and then be acted upon (all the mandatory and WCS keywords, those defined by specific conventions, etc.) - a keyword just records some information associated to a file, which is intended to be read by a human, but it is hardly relevant to any software (essentially "commentary" keywords). I'd suggest FITS keywords fall into three categories: 1) FITS metadata, that is "data about FITS data" - examples start with the mandatory keywords, SIMPLE, XTENSION, BITPIX, NAXISn, PCOUNT, GCOUNT, but also CHECKSUM and DATASUM, etc. 2) Science metadata, that is "data about the data represented within the HDU or file" - examples are DATE-OBS, EXPTIME, the slew of WCS keywords, etc. 3) Provenance - this may be purely commentary including COMMENTs and HISTORY, but may also be contained in keywords with values, but the point is that it doesn't describe the file as it is, but rather, how it came to be. The most obvious here is DATE. One can make these distinctions finer grained - for instance INHERIT is meta-science-metadata - but it isn't clear how useful that is likely to be. DUPKWDS = 'none' assures that the FITS file was written without any duplicated keywords DUPKWDS = 'ignore' (or 'comments') declares that duplicated keywords are of commentary nature, so they can be ignored by s/w or dealt with as HISTORY or COMMENTS DUPKWDS = 'take_first' declare that only the first or last value DUPKWDS = 'take_last' shall be considered DUPKWDS = 'concatenate' declare (string) values wanting to be concatenated (also numeric arrays ??) Any other cases possible ? I suspect most will think we're reaching diminishing returns. If we can't reach consensus on whether the first or last instance should take precedence then "indeterminate" it will have to be. I'm still interested to hear of cases where the duplicates are intentional. Perhaps these would be addressed better through some other mechanism than duplication? But even with such conventions, we are still left with the problem of what a generic reader should do with (older or not) files not following any convention. What is this generic reader people keep talking about? Data is only ever read for some purpose. If the purpose is to display the header to a human, then display both copies of duplicate keywords. If the purpose is to semantically capture the value of such a keyword, INDEF seems appropriate (and we would do our users a favor to clarify the standard to say so). If the purpose is to copy the input to the output, copy it verbatim. If the purpose is to validate the data structures, throw a warning if you want on detecting a duplicate keyword - just don't throw an error. But if it is one of the key structural keywords, there is no need to clarify the standard to know to throw a big, fat, juicy error, e.g., duplicating BITPIX calls the parsing of the file into question. Beneath every standard lies a bedrock of logic. A nod is as good as a wink to a blind horse. Rob |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[fitsbits] Proposed Changes to the FITS Standard | Preben Grosbol | FITS | 59 | August 25th 07 05:01 PM |
[fitsbits] Proposed Changes to the FITS Standard | Boud Roukema | FITS | 0 | August 18th 07 09:27 AM |
[fitsbits] Proposed Changes to the FITS Standard | Steve Allen | FITS | 0 | August 1st 07 06:08 PM |
[fitsbits] Proposed Changes to the FITS Standard | Thierry Forveille | FITS | 0 | August 1st 07 04:51 PM |
[fitsbits] Proposed Changes to the FITS Standard | Mark Calabretta | FITS | 0 | August 1st 07 09:01 AM |