View Single Post
  #15  
Old August 17th 07, 08:32 PM posted to sci.astro.fits
Rob Seaman
external usenet poster
 
Posts: 49
Default [fitsbits] Proposed Changes to the FITS Standard

1. Keywords that have a value shall not be repeated in a header.

I have many examples (hundreds of thousands?) of files in which
keywords are repeated. Rather than the wording in the current
proposal, I would replace the attempt at a requirement with a strong
recommendation and a clarification that the final copy of any such
repeated keyword should take precedence.

2. PCOUNT and GCOUNT must immediately follow the last NAXISn
keyword in all conforming extensions (as is already required
in IMAGE, TABLE, and BINTABLE extensions).


I guess I'd like to know if there are any such extensions. If not,
this is relatively safe. If so, make it a strong recommendation for
an explicit list of grandfathered extension types and an absolute
requirement for any newly defined extensions.


It got me thinking, so I looked at the FITS parser in iSTB (the
current version of save-the-bits deployed on three mountaintops and
handling several terabytes of raw data annually). And no, I don't
currently require PCOUNT and GCOUNT to immediately follow NAXISn. I
do, however, throw an error if these particular keywords are
duplicated :-)

Speaking of which, it is the duplicate keyword requirement that seems
most onerous. To implement this efficiently for all keywords, one
would have to build a hash table or some such for each header. Then
one is left with the question of what to do upon detecting a
duplicate. The sense of a requirement is to simply throw an error
and exit. How helpful is that? STB will toss a FITS file if any of
the structural keywords (NAXISn, BITPIX, PCOUNT, GCOUNT, etc.) are
questionable - precisely because this calls into question the
possibility of handling the data appropriately. The daemon needs to
know the size of the file because it is reading it on the standard
input, perhaps concatenated with other files. The size of each
extension must be known to find subsequent extensions. Etc.

But am I to discard brand new data simply because some camera
temperature keyword appears twice? I spend a lot of time every week
trying to convince a dozen different instrument teams to provide the
archive with a reliable DATE-OBS, EXPTIME, FILTER, OBSTYPE, etcetera
and so forth. They'll rebel if I start tossing their data due to
foibles with minor engineering keywords.

I really think enforcing #1 will prove impossible in practice. I'm
not going to build a hash table to search for duplicates for every
keyword just so I can throw an error that will anger my stakeholders
over trivial details. And on the other hand, for pipeline reduced
science data sets, no requirement is needed since there already is
sufficient impetus for data providers to carefully tailor their data
products, eliminating duplicate keywords as a matter of course.

Making it a strong recommendation is my own strong recommendation.

Rob