[fitsbits] Dataset identifications.

#1 March 11th 04, 06:38 PM

off the top of my head of an example where a given FITS file would
naturally split up into multiple IDs.

Tom - I think Thierry's example is reasonable, if I can riff on it:
suppose I have run a modelling tool to get the best deconvolved image
fit simultaneously to ROSAT and CHandra data, and stored the
result in the FITS file. If I understand Thierry's point correctly,
you would like to have traceability to say that your pretty picture
(sorry, your important science result) was generated from the following
two original datasets.
However, I would say to Thierry that the new file should indeed have
a brand new dataset identifier - you have in this case created a
new dataset. The traceability to the original observations should
be done by some kind of history mechanism that would indeed includ
the dataset identifiers of the original data. We'll have to wrry
about that, but I think it's a second order thing and I don't think
we want to have a complex scheme of nested keywords here - just
one dataset identifier keyword per dataset, and use HISTORY or
something to refer to parent ones.
In conclusion, I support Tom's original recommendation.
DS_IDENT looks good to me.

- Jonathan

#2 March 11th 04, 06:38 PM

off the top of my head of an example where a given FITS file would
naturally split up into multiple IDs.

Tom - I think Thierry's example is reasonable, if I can riff on it:
suppose I have run a modelling tool to get the best deconvolved image
fit simultaneously to ROSAT and CHandra data, and stored the
result in the FITS file. If I understand Thierry's point correctly,
you would like to have traceability to say that your pretty picture
(sorry, your important science result) was generated from the following
two original datasets.
However, I would say to Thierry that the new file should indeed have
a brand new dataset identifier - you have in this case created a
new dataset. The traceability to the original observations should
be done by some kind of history mechanism that would indeed includ
the dataset identifiers of the original data. We'll have to wrry
about that, but I think it's a second order thing and I don't think
we want to have a complex scheme of nested keywords here - just
one dataset identifier keyword per dataset, and use HISTORY or
something to refer to parent ones.
In conclusion, I support Tom's original recommendation.
DS_IDENT looks good to me.

- Jonathan

#3 March 12th 04, 03:57 PM

Jonathan McDowell says:

you would like to have traceability to say that your pretty picture
(sorry, your important science result) was generated from the following
two original datasets.

I would say [...] that the new file should indeed have a brand new
dataset identifier - you have in this case created a new dataset.
The traceability to the original observations should be done by some
kind of history mechanism that would indeed includ the dataset
identifiers of the original data.

Again, here are two separate usage possibilities (out of how many?)
As far as the standard, any mechanism needs to be able to handle both.
Alternately, one possibility can be mandated and the other outlawed.
Are we really at the point of consensus? For example, what about
calibration data? Do they comprise members of a separate dataset or
rather does a particular ground-based or space-based, classical or
queue, flat-field or bias frame or standard exposure belong to multiple
object datasets from the same observing session or pipeline? I can think
of arguments for both possibilities - and I suspect different projects
will make different choices completely independent of FITS. Are we
prepared to tell projects that happen to back the "wrong" data model
that we're sorry but they can't use FITS? On the other hand, are we
prepared to invest the resources necessary to convince the rest of the
astronomical community that the FITS data model should take precedence?
And if we are - then where is the FITS data model?

We'll have to wrry about that, but I think it's a second order thing
and I don't think we want to have a complex scheme of nested keywords
here - just one dataset identifier keyword per dataset, and use HISTORY
or something to refer to parent ones.

"Just one dataset identifier keyword per dataset" begs the question:
What is a dataset? Do we have a common answer to that for all data,
past, present and future?

Are we ready to reject multiple inheritance out of hand? And do we not
believe that hierarchical grouping has some place in future FITS data
management?

Mark Calabretta says:

Putting an index of IDs in the primary header, i.e. just a list of the
IDs that come later, would preclude the need to scan past the first
header when looking for a particular ID.

This is a practical solution - for some projects. There is an assumption
here that the primary header is somehow related to the rest of the file.
There may also be an assumption that the primary HDU is dataless.

Rob Seaman
NOAO Science Data Systems

#4 March 12th 04, 03:57 PM

Jonathan McDowell says:

you would like to have traceability to say that your pretty picture
(sorry, your important science result) was generated from the following
two original datasets.

I would say [...] that the new file should indeed have a brand new
dataset identifier - you have in this case created a new dataset.
The traceability to the original observations should be done by some
kind of history mechanism that would indeed includ the dataset
identifiers of the original data.

Again, here are two separate usage possibilities (out of how many?)
As far as the standard, any mechanism needs to be able to handle both.
Alternately, one possibility can be mandated and the other outlawed.
Are we really at the point of consensus? For example, what about
calibration data? Do they comprise members of a separate dataset or
rather does a particular ground-based or space-based, classical or
queue, flat-field or bias frame or standard exposure belong to multiple
object datasets from the same observing session or pipeline? I can think
of arguments for both possibilities - and I suspect different projects
will make different choices completely independent of FITS. Are we
prepared to tell projects that happen to back the "wrong" data model
that we're sorry but they can't use FITS? On the other hand, are we
prepared to invest the resources necessary to convince the rest of the
astronomical community that the FITS data model should take precedence?
And if we are - then where is the FITS data model?

We'll have to wrry about that, but I think it's a second order thing
and I don't think we want to have a complex scheme of nested keywords
here - just one dataset identifier keyword per dataset, and use HISTORY
or something to refer to parent ones.

"Just one dataset identifier keyword per dataset" begs the question:
What is a dataset? Do we have a common answer to that for all data,
past, present and future?

Are we ready to reject multiple inheritance out of hand? And do we not
believe that hierarchical grouping has some place in future FITS data
management?

Mark Calabretta says:

Putting an index of IDs in the primary header, i.e. just a list of the
IDs that come later, would preclude the need to scan past the first
header when looking for a particular ID.

This is a practical solution - for some projects. There is an assumption
here that the primary header is somehow related to the rest of the file.
There may also be an assumption that the primary HDU is dataless.

Rob Seaman
NOAO Science Data Systems

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[fitsbits] Dataset identifications.	Thierry Forveille	FITS	12	March 12th 04 02:33 PM
[fitsbits] Dataset identifications.	Thomas McGlynn	FITS	0	March 10th 04 07:20 PM
[fitsbits] [fitswcs] WCSLIB 3.3	Stephen Walton	FITS	0	October 22nd 03 07:38 PM
[fitsbits] AST V3.0 now available	David Berry	FITS	0	October 9th 03 04:40 PM
[fitsbits] proposed FITS MIME types Internet Draft	Steve Allen	FITS	0	October 1st 03 05:49 AM