SpaceBanter.com - View Single Post

#7 March 11th 04, 10:18 PM

Tom McGlynn writes:

My feeling is that if the two elements of data are so tightly
coupled that they would normally be put in the same FITS file,
then one would consider them to be the same dataset. At the HEASARC
datasets generally comprise more than one FITS file. I can't think
off the top of my head of an example where a given FITS file would
naturally split up into multiple IDs.

NOAO (through "Save the bits") has three or four million discrete FITS
images packaged up into MEF files for purposes of efficient and easy
handling. On the other hand, HEASARC's usage supplies an example
involving one dataset that contains several files. Surely there are
(or will be) other examples than NOAO's of one FITS file containing
several datasets. And if we don't believe this to be a useful
characteristic of FITS, we should be seeking to outlaw FITS as a
mechanism for storing "unrelated" data under the umbrella of a single
file. (Good luck defining "unrelated".) Failing that, any dataset
ID convention (or any FITS dataset grouping tools in general) will have
to deal with both possibilities.

Personally, I think before we reserve "DS_IDENT" or any other keyword
for the purpose of identifying datasets, we should define the concept
of a "dataset". Do other communities beyond astronomy share this
notion? Does astronomy itself share a single clear vision of what
constitutes a dataset? In short, where does the dataset object reside
in the cosmic class diagram in the sky?

I suspect I'm also not alone in wondering why our grand discussions of
discovering and developing an overall astronomical ontology and semantic
web and all those other VO "vision" things continually breaks down into
ad hoc, off the cuff, suggestions of random notions of the "right" way
to do this thing or that thing with no connection to the whole.

Either our headers are real world, all too human, faulty examples of an
underlying Platonic ideal - and we should try to characterize that ideal -
or we should stop pretending that our individual datasets share any
common attributes at all.

Rob Seaman
NOAO Science Data Systems