[fitsbits] Proposed Changes to the FITS Standard

**Rob Seaman** · #1 August 19th 07, 06:15 PM posted to sci.astro.fits

Perhaps there is a consensus building to back off from an absolute
ban on duplicate keywords. Here's my reply to Bill's latest in any
event - kind of a "Tao of FITS". Apologies for the length:

- keyword values are restricted to be a single value, not an array
- logical keyword values must consist of a single T or F followed
only by a space or a slash character
- integer and float keyword values must not contain embedded spaces
- complex keyword values must be enclosed in parentheses
- no other keywords may intervene between the mandatory keywords in
the primary array or extension
- the TFORM keyword values must be upper case (e.g., F5.2, not f5.2)

These are all (except perhaps the last) rare occurrences. In that
case, a newly placed requirement is more like a clarification of the
standard than a change. Duplicate keywords, on the other hand, are a
frequent occurrence (thus the interest in eliminating them :-) "Once
FITS, always FITS" may never have been in serious question before.

Imposing a new requirement on software systems to read the last
instance of the keyword would likely have a lot of negative
repercussions.

No more so than imposing a new requirement to detect and act on
duplicate keywords. The difference is that outlawing duplicates
doesn't fix those systems reading indeterminate values.

We might ask what an ideal FITS library or application should do on
encountering various exceptions. Whether or not duplicate keywords
are outlawed, deprecated or ignored, feeding such input to our
software will remain a frequent event. We can't legislate moral
behavior, rather only the consequences for detected immorality.

It seems to me that in this imperfect world it would be better if the
major FITS software packages adopted a coherent behavior on
encountering duplicate keywords. A header with duplicate FITS
keywords is not a bug. Currently, it is perfectly legal FITS, if
questionable practice. This cannot now be rescinded (except with
some form of HDU-level versioning, I still assert). But even if
duplicate keywords were illegal FITS, the question remains of what
FITS software should do upon encountering them - and how our code
should recognize the fact in the first place.

Requiring all software systems to follow the same behavior is not
practical, so the only sure way to prevent users from getting an
incorrect result when analyzing the file is to eliminate duplicate
keywords in the first place.

You cannot avoid the question of what software is required to do by
outlawing data. Those data can and will continue to be presented as
input to our software. Perhaps there is some notion that we'll
require all archives and data providers to scrub their data. This is
at least as impractical a requirement as you describe for the
software - more to the point, who will data providers turn to for
software to perform such scrubbing? One way or the other, if we
tackle this issue our software will have to detect duplicate keyword
instances and take some action as a result.

There is less harm if the duplicated keywords all have the same
value, so maybe the wording of this requirement should be modified
to take this into account.

This strikes me as the sort of contingent action that indicates the
primary action is ill conceived. As far as the software, it is
simply another requirement placed on top of the first. Look for
duplicate keyword names, then look for duplicate values - would the
next step be a test for duplicate comments?

some of your FOREIGN extensions have the order of these 2 keywords
reversed.

We'll look into the behavior you describe. I would expect most
extension types, including FOREIGN, to be conformable to this more
strict keyword ordering whether it is required or merely preferred.

In addition to clarifying the ordering of PCOUNT/GCOUNT, this may be
a good time to state this more clearly for all the mandatory keywords
(section 4.4). In particular, the ordering of NAXISn is never
explicitly restricted to increasing numerical order. The only
statement for any of the mandatory keywords is presented in table
4.5, which suggests NAXISn be ordered, but never outright says it.

3. Embedded space characters are now forbidden within numeric
values in an ASCII Table (e.g. "1 23 4.5" is no longer
allowed to represent the decimal value 1234.5)

Again - are there any examples of such usage in the field?

No, as far as we know. If there are any, then it is very likely
that most current software systems do not support embedded spaces
in the value and will silently read an incorrect value, or will
exit with an error. Thus, it seems better to me to outlaw this
usage rather than just not recommend it or deprecate it.

Again, the question is whether it is more productive to attempt to
outlaw something or to describe what steps software should take upon
encountering the usage. If there are no known instances, "outlawing"
is equivalent to clarifying the standard. This is likely such a
case. If there are many instances, I don't think we can escape from
taking a position on what the software should do.

I don't really see any practical benefit to having a version
keyword. Either the software will support a new requirement, or it
won't; the presence of a version (or DATE) keyword isn't really
helpful, except maybe to a human reading the header.

I don't understand. The software would interpret the version to know
if the new requirement should be enforced for a particular HDU. In
the absence of such versioning (by token or date), the software has
to follow some sloppy heuristic to let the nuances of the data guide
its behavior. The other two new requirements on the table strike me
as clarifications and can go forward without versioning, perhaps with
some tweaking of the language. I'm not sure about the EXTEND keyword.

I'm not a big fan of introducing versioning myself, but the clear
implication of avoiding versioning is that duplicate keywords cannot
be gracefully banned after the fact. In fact, consider a situation
in which the choice had been made to ban them back in the FITS
Dreamtime - exactly the same stringent software requirements would
pertain to detect instances and take application dependent action.
Our libraries and applications would be more complex now as a
result. (Arguably better, but certainly more complex.) Banning
duplicates doesn't avoid significant new software requirements, it
mandates them.

The proposed new statement ("Existing FITS files that conformed to
the latest version of the standard at the time the files were
created are expressly exempt from any new requirements imposed by
subsequent versions of the standard.") is, I think, mainly intended
as a political statement to reassure institutions that the FITS
committees are not imposing new unfunded mandates that require
modifications to existing FITS archives. I don't see this
statement as having much relevance to the way software is implemented.

You can't avoid the unfunded mandate this way. Any software seeking
to follow the letter of the standard would still have to detect
instances of duplicate keywords and take some action. What
statements like this do is to encourage folks to treat the standard
as some floppy set of guidelines and conformance to the standard as
an optional nicety for polite society.

A file either conforms to the FITS standard or it does not. A ban on
duplicate keywords is unenforceable unless it is paired with
versioning. The statement above would fail to impress a lawyer since
it isn't paired with a way for either humans or computers to
determine which files were grandfathered in. Further, there is a
sense of legal entrapment in promulgating such a new requirement with
no realistic way to encourage instrument teams and others to redesign
their systems to avoid duplicates. For instance, the ICE/ccdacq
software permits observers to enter their own file of keywords,
perhaps including duplicates. Users can trivially use IRAF hedit to
add duplicates, etc. Perhaps there is no way to duplicate a keyword
with CFITSIO? Who would enforce the ban?

In any event, the FITS standard should be kept free of political
statements.

This is missing the main point of this new requirement. No current
software system that I am aware of (except for the FITS verifier
code) checks for duplicated keywords, so users have no idea which
of the duplicated keywords is being used by a particular program.
The software might be using the first, the 'next', or the last
instance of the keyword.

Well, as I said, iSTB throws an error if duplicate structural
keywords are encountered. After 10 million files, I don't think I've
ever seen this particular error in BITPIX, NAXISn, PCOUNT, GCOUNT or
EXTENSION. We did just happen to see duplicate SIMPLE keywords while
commissioning a new instrument. The problem was detected, reported
and fixed. On the other hand, there are numerous ongoing examples of
duplicated user keywords. It seems to me that applications should
only be sensitive to header abnormalities that affect their own
functionality.

Instituting an absolute ban is meaningless unless all our software
systems become aware of all possible duplicates. We can't just dump
the responsibility on the users to avoid creating them in the first
place unless our own software that they are using to create or update
the HDUs aids in that task.

This ban is attempting to avoid placing natural requirements on
software by placing unnatural ones on the data. Not only is it
unenforceable - the software requirements just pop up again elsewhere.

This could easily cause the user to derive incorrect scientific
results. What is the best way to prevent this from happening?

This is the heart of the matter. As Dick says, there is no single
simple solution. We should encourage data providers (and users) to
avoid duplicate keywords. We should understand why such keywords may
be created in the first place. Our major software packages should
reach agreement on a common strategy should duplicates be encountered
- whether this is that the behavior remain indeterminate, or the
first instance or the last instance take precedent. Applications
should detect duplicates which affect their functionality as with any
other header peculiarities. Libraries should provide routines and
utility programs for validating HDUs against a wide variety of
exceptions, including duplicate keywords.

A duplicated keyword is just one of a long list of poor header
construction techniques that can't be fixed simply by demanding they
not occur.

Seems to me we should focus on the root of the problem and
(formally at least) disallow duplicated keywords in a conforming
FITS file. This doesn't mean software should automatically throw
out a file that inadvertently has a duplicated keyword.

"Formal" is the essence of a standard. I guess the notion is that
deprecation hasn't proven strong enough so perhaps an absolute ban
might do the trick? In the absence of practical consequences, what
this really does is call the integrity of the standard into question.

I think the seriousness of this problem depends on what keyword is
duplicated. If it is just some observatory-specific keyword that
does not directly affect the scientific results, then it does not
matter very much, and data providers need not worry about it. But
if a critical WCS keyword, or exposure time keyword is duplicated
in the file with different values, then surely the data providers
need to take responsibility and fix the problem.

Whether the issue is duplicate keywords or some other keyword
misformatting, there is more pressure on the data providers already
to fix significant occurrences than this technical change to the FITS
standard would apply. On the other hand, for the much more frequent
case of unintentionally duplicating some non-critical keyword, this
change would be outlawing files for no benefit and a lot of
annoyance. In either case, the software faces exactly the same
requirements.

Rob

**LC's NoSpam Newsreading account[_2_]** · #2 August 21st 07, 01:05 PM posted to sci.astro.fits

On Sun, 19 Aug 2007, Rob Seaman wrote:

[...]
change. Duplicate keywords, on the other hand, are a frequent
occurrence (thus the interest in eliminating them :-)
[...]
It seems to me that in this imperfect world it would be better if the
major FITS software packages adopted a coherent behavior on
encountering duplicate keywords. A header with duplicate FITS
keywords is not a bug. Currently, it is perfectly legal FITS, if
questionable practice.

Maybe the point is that the nature (usage ?) of most keyword (type)s is
indeterminate (or unpredictable by whoever wrote the file ?) or at least
oscillates between those two extreme cases :

- a keyword is intended as a named resource to be mainly read by
software, maybe into a variable, and then be acted upon (all the
mandatory and WCS keywords, those defined by specific conventions,
etc.)

- a keyword just records some information associated to a file, which
is intended to be read by a human, but it is hardly relevant to any
software (essentially "commentary" keywords).

If all commentary information would be written into commentary or
"value-less" keywords (4.4.2.4, 4.1.2.2, 4.1.2.3) a generic reader will
have no problems.

Talking about readers we can think of essentially two types of readers:

- specific readers, which read only the keywords they know of
beforehand. They read them by name. They know beforehand they should
correspond to a variable of a given type (integer, real, string...).
They most likely search for a keyword of a given name (and probably
stop at the first occurrence).

But if they know of or expect a duplicated keyword may knowingly
act in some predefined way (does anybody know such a beast ?)

- all-purpose readers. I can imagine things like reading the entire
header into memory, or generating some data structure scanning the
entire header. I have for instance an IDL procedure which reads a
file into a structure with elements a.kwd1, a.kwd2 ... a.kwdn and
a.data (the data array). Actually my procedure does not read FITS
file, but a format of my own (which can however be generated also
from FITS) ... and its relies on the (sound) idea that keywords have
unique names, because structure element names are built on the fly
from kwd names (so a.bitpix, a.naxis, a.bunit ...).

In such a procedure duplicate kwds are a nuisance and trigger an
error. In fact since my procedure skips COMMENTs (do not enters them
in the structure at all), so it treats two particular keywords
(HISTORY and another non-FITS one) as repeatable (in which cases
it generates structure elements a.h0001, a.h0002 etc.) and fails
in error in all other cases.

All these seemed to me reasonable sound practice, and this inspired the
idea to forbid duplication of (named, non-commentary, non-valueless)
keyword in FITS 3.0.

Given now that it seems there are more live FITS files which by purpose
or accident (not error) contain duplicated keywords, we could probably
demote the change from forbidding to strongly recommending against.

But it is hard to define a preferred way to deal with duplicated kwds.
Unless we register alternate conventions which explicitly specify what
the reader should do about duplicated keywords. E.g.

DUPKWDS = 'none' assures that the FITS file was written without
any duplicated keywords

DUPKWDS = 'ignore' (or 'comments') declares that duplicated
keywords are of commentary nature, so they can
be ignored by s/w or dealt with as HISTORY or
COMMENTS

DUPKWDS = 'take_first' declare that only the first or last value
DUPKWDS = 'take_last' shall be considered

DUPKWDS = 'concatenate' declare (string) values wanting to be
concatenated (also numeric arrays ??)

Any other cases possible ?

But even with such conventions, we are still left with the problem of
what a generic reader should do with (older or not) files not following
any convention.

Lucio Chiappetti

--
----------------------------------------------------------------------
is a newsreading account used by more persons to
avoid unwanted spam. Any mail returning to this address will be rejected.
Users can disclose their e-mail address in the article if they wish so.

**Steve Allen** · #3 August 21st 07, 06:04 PM posted to sci.astro.fits

On Tue 2007-08-21T14:05:26 +0200, LC's NoSpam Newsreading account hath writ:
But even with such conventions, we are still left with the problem of
what a generic reader should do with (older or not) files not following
any convention.

That is true for more than just keywords, and is inherent to FITS.
There is no mechanism for a FITS file to communicate how it is expected
to be used or what any of the meanings are for "additional" keywords,
i.e., keywords neither "mandatory" nor "reserved", nor for how to
use any appended images and tables.

Getting back to the issue of dupes, our spectrographs produce
duplicate keywords, where the different values are of different data
types. Without delving into the current code and doing archaeology to
find the source for old versions which are still in use I can't even
write out a recipe for interpreting whether those duplicated keywords
are indicating that the system was in a normal or abnormal state when
the image was acquired. I do not expect that anyone else should care
how they would properly be interpreted, but I am sure that an explicit
directive in the FITS standard would be misleading.

I don't think this particular problem can be solved other than by
explicitly indicating that the standard does not assign any meaning.
But because I think there are a lot of other similar "problems"
I'm not sure that the FITS standard benefits by making it explicit.
This is the sort of thing that does belong in a user guide.

--
Steve Allen WGS-84 (GPS)
UCO/Lick Observatory Natural Sciences II, Room 165 Lat +36.99855
University of California Voice: +1 831 459 3046 Lng -122.06015
Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m

**Rob Seaman** · #4 August 21st 07, 06:43 PM posted to sci.astro.fits

Lucio Chiappetti wrote:

- a keyword is intended as a named resource to be mainly read by
software, maybe into a variable, and then be acted upon (all the
mandatory and WCS keywords, those defined by specific conventions,
etc.)

- a keyword just records some information associated to a file, which
is intended to be read by a human, but it is hardly relevant to any
software (essentially "commentary" keywords).

I'd suggest FITS keywords fall into three categories:

1) FITS metadata, that is "data about FITS data" - examples start
with the mandatory keywords, SIMPLE, XTENSION, BITPIX, NAXISn,
PCOUNT, GCOUNT, but also CHECKSUM and DATASUM, etc.

2) Science metadata, that is "data about the data represented within
the HDU or file" - examples are DATE-OBS, EXPTIME, the slew of WCS
keywords, etc.

3) Provenance - this may be purely commentary including COMMENTs and
HISTORY, but may also be contained in keywords with values, but the
point is that it doesn't describe the file as it is, but rather, how
it came to be. The most obvious here is DATE.

One can make these distinctions finer grained - for instance INHERIT
is meta-science-metadata - but it isn't clear how useful that is
likely to be.

DUPKWDS = 'none' assures that the FITS file was written
without
any duplicated keywords

DUPKWDS = 'ignore' (or 'comments') declares that duplicated
keywords are of commentary nature, so they
can
be ignored by s/w or dealt with as HISTORY or
COMMENTS

DUPKWDS = 'take_first' declare that only the first or last value
DUPKWDS = 'take_last' shall be considered

DUPKWDS = 'concatenate' declare (string) values wanting to be
concatenated (also numeric arrays ??)

Any other cases possible ?

I suspect most will think we're reaching diminishing returns. If we
can't reach consensus on whether the first or last instance should
take precedence then "indeterminate" it will have to be. I'm still
interested to hear of cases where the duplicates are intentional.
Perhaps these would be addressed better through some other mechanism
than duplication?

But even with such conventions, we are still left with the problem of
what a generic reader should do with (older or not) files not
following
any convention.

What is this generic reader people keep talking about? Data is only
ever read for some purpose. If the purpose is to display the header
to a human, then display both copies of duplicate keywords. If the
purpose is to semantically capture the value of such a keyword, INDEF
seems appropriate (and we would do our users a favor to clarify the
standard to say so). If the purpose is to copy the input to the
output, copy it verbatim. If the purpose is to validate the data
structures, throw a warning if you want on detecting a duplicate
keyword - just don't throw an error. But if it is one of the key
structural keywords, there is no need to clarify the standard to know
to throw a big, fat, juicy error, e.g., duplicating BITPIX calls the
parsing of the file into question. Beneath every standard lies a
bedrock of logic.

A nod is as good as a wink to a blind horse.

Rob

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[fitsbits] Proposed Changes to the FITS Standard	Preben Grosbol	FITS	59	August 25th 07 05:01 PM
[fitsbits] Proposed Changes to the FITS Standard	Boud Roukema	FITS	0	August 18th 07 09:27 AM
[fitsbits] Proposed Changes to the FITS Standard	Steve Allen	FITS	0	August 1st 07 06:08 PM
[fitsbits] Proposed Changes to the FITS Standard	Thierry Forveille	FITS	0	August 1st 07 04:51 PM
[fitsbits] Proposed Changes to the FITS Standard	Mark Calabretta	FITS	0	August 1st 07 09:01 AM