[fitsbits] 'Dataset Identifications' postings (digest)

March 23rd 04, 08:47 AM

Lucio Chiappetti via

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

Report this post as spam, offensive or inappropriate

March 23rd 04, 08:47 AM

Lucio Chiappetti via

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

March 23rd 04, 03:04 PM

Arnold Rots

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

It may be good to clarify the context and scope of what Tom is
proposing (at least my take on it; I won't claim to speak for Tom).

The proposal is to introduce the DS_IDENT keyword as a convention for
dataset identifiers and to define one particular set of values for
this keyword - the ones under the autority of the ADS, i.e.,
identifier values starting with "ADS/". Anybody who wants to
participate in the use of this convention is free to do so, but will
have to comply with the the rules of that convention, which a

1. the identifier is of the form "ADS/observatory#dataset"
2. observatory must be taken from the list maintained by the ADS
3. dataset values are controlled by the data center or observatory
that bears responsibility for the observatory archive
4. that controlling authority, and its successors and assigns, must
guarantee access to dataset in perpetuity
5. the keeper of the observatory data will provide a specific set of
services that allow identifier verification, harvesting, and access to
the datasets

If someone else wants to define another class of identifiers (i.e.,
other than the "ADS/" class), that is fine, but it would probably be
sensible to make sure that the values and useage comply with IVOA
standards (as the ADS ones do) in order to maximize usefulness and
recognition.

I can tell what they, most likely, will look like for Chandra.
There will be (at least) three groups:
ADS/Sa.CXO#obs/ObsId
Points to a particular observation
ADS/Sa.CXO#defset/name
Points to a specifically defined set of observations
ADS/Sa.CXO#bibcode/bibcode
Points to all information we have for a particular paper

Of course, this begs two questions:
- Can two files have the same DS_IDENT value?
The answer should be yes, since a dataset may consist of more than one
file.
- Can one file belong to more than one dataset?
The answer is again yes. This may mean that we should allow for
DS_IDn keywords.

(I said "files"; you may read "extensions", if you like)

The question has come up in which headers the keyword should appear.
I would recommend putting it in any and all headers where it is
appropriate - primary and secondary.

Hope this helps,

- Arnold

Don Wells wrote:
...

From: Thomas McGlynn
Subject: [fitsbits] Dataset identifications.
Newsgroups: sci.astro.fits
Date: Wed, 10 Mar 2004 14:20:18 -0500
Organization: NASA Goddard Space Flight Center
Reply-To:

There is an effort underway at several of the NASA archives
to provide a standard dataset identifier for data that
can be retrieved from the archives. The initial motivation
is that when authors publish a paper they will be able
to specify the data that was used in analysis and systems
like the ADS will be able to provide links to these data
in a systematic way from the papers (and vice versa for
the archives). Currently this is done for a few datasets
but it's a very manual and labor intensive process. Although
the initial impetus is coming from some of the NASA sites,
we've been talking with the VO efforts and hope that the
ID will be of general utility. I've no doubt that if ID's
become established they will be used in many
different ways.

There are discussions still ongoing as to the exact format
to be used. It is intended that the overall format will be
compatible with the identification standards that are being
discussed in the Virtual Observatory world. An example ID
might be ADS/Sa.ROSAT#X/rh701576n00 where the ADS indicates
the the ADS will provide the high level resolution service,
the 'Sa.ROSAT' is an observatory identifier, and the
element that follows the # is observatory specific, but
should be familiar enough for those who have used ROSAT
data.

The question for this group is not so much a discussion of the format
of the ID. Rather it was pointed out that if these IDs are successful
it would be useful to be able to have a standard
FITS keyword that would indicate the dataset id that the current
file belongs to. The keyword 'DS_IDENT' has been suggested.
Does anyone have objections to this or do they know of systems
that already use this keyword? Googling DS_IDENT returns an album
of Donna Summer's but no FITS references.

Also, are there any issues the we need to resolve regarding
the usage of the keyword? One that comes to mind is whether use of this
keyword should be recommended only for the primary header of a FITS
file. If not then a file may not be associated with a unique dataset
id.

I'd appreciate any comments, questions or thoughts on the subject.

Thanks,
Tom McGlynn
HEASARC
...

--
Donald C. Wells Scientist
http://www.cv.nrao.edu/~dwells
National Radio Astronomy Observatory +1-434-296-0277
520 Edgemont Road, Charlottesville, Virginia 22903-2475 USA
_______________________________________________
fitsbits mailing list

http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits

--------------------------------------------------------------------------
Arnold H. Rots Chandra X-ray Science Center
Smithsonian Astrophysical Observatory tel: +1 617 496 7701
60 Garden Street, MS 67 fax: +1 617 495 7356
Cambridge, MA 02138
USA http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------

March 23rd 04, 03:04 PM

Arnold Rots

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

March 23rd 04, 04:51 PM

LC's No-Spam Newsreading account

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

On Tue, 23 Mar 2004, Arnold Rots wrote:

The proposal is to introduce the DS_IDENT keyword as a convention for
dataset identifiers and to define one particular set of values for
this keyword - the ones under the autority of the ADS, i.e.,

what does it make them "under the authority of the ADS" ? A specific
agreement between ADS and Observatory archive and/or paper author
and/or journal and/or IAU ?

2. observatory must be taken from the list maintained by the ADS

5. the keeper of the observatory data will provide a specific set of
services that allow identifier verification, harvesting, and access to
the datasets

What is an observatory here ? A ground based institution (but in that
case won't it be better to have a telescope-instrument identifier ?) OR
a satellite OR the OFFICIAL data centre of such satellite data ?

This seems to rule out "private" datasets (as I defined in my earlier
posting) - which might be good - but what about "catalogue" datasets ?

If someone else wants to define another class of identifiers (i.e.,
other than the "ADS/" class), that is fine, but it would probably be
sensible to make sure that the values and useage comply with IVOA
standards (as the ADS ones do) in order to maximize usefulness and
recognition.

what is IVOA ?

is this a task for the FITS community (if not maybe we should stop
here, or confine the discussion to few FITS specific items), for some
other IAU body, or for somebody else ?

ADS/Sa.CXO#obs/ObsId
Points to a particular observation
ADS/Sa.CXO#defset/name
Points to a specifically defined set of observations

Once again these seem to point to something which can be assigned
only by an official data centre.

ADS/Sa.CXO#bibcode/bibcode
Points to all information we have for a particular paper

Who is "we" in the above sentence, and what papers should be concerned ?

Any paper published on a journal indexed by the ADS ?
and who is storing the relevant data ? ADS, CDS, data centre, author ?
Any paper on Chandra assuming that the author sends associated
reduced data to the Chandra data centre
Any paper published by Chandra data centre staff only ?

Of course, this begs two questions:
- Can two files have the same DS_IDENT value?
- Can one file belong to more than one dataset?

Yes, but what about the case of the results of a paper regarding
the analysis of some particular observational data ?

The original (starting) data will be stored at some data centre,
but the result will in general be privately owned by the authors,
and do not BELONG TO the original dataset, more they STEM OUT OF
the original dataset (parent-child relation)

--
----------------------------------------------------------------------
is a newsreading account used by more persons to
avoid unwanted spam. Any mail returning to this address will be rejected.
Users can disclose their e-mail address in the article if they wish so.

March 23rd 04, 04:51 PM

LC's No-Spam Newsreading account

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

March 23rd 04, 08:14 PM

Arnold Rots

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

Maybe it helps to state the practical purpose of the identifiers.
It's put in there to inform users as to what dataset identifier to use
if and when they insert such identifiers into their manuscripts.

The purpose of that is to facilitate the linkage between the
literature and the archived datasets. Those links are currently being
maintained by a number of data centers (and the ADS) but it is rather
labor-intensive. This mechanism would allow for automatic harvesting.

More responses below.

- Arnold

LC's No-Spam Newsreading account wrote:
On Tue, 23 Mar 2004, Arnold Rots wrote:

The proposal is to introduce the DS_IDENT keyword as a convention for
dataset identifiers and to define one particular set of values for
this keyword - the ones under the autority of the ADS, i.e.,

what does it make them "under the authority of the ADS" ? A specific
agreement between ADS and Observatory archive and/or paper author
and/or journal and/or IAU ?

The fact that they start with "ADS/". It is indeed tied in with an
agreement between ADS, data centers, journals, aimed at enabling ADS
and data centers to harvest literature-dataset links.

2. observatory must be taken from the list maintained by the ADS

5. the keeper of the observatory data will provide a specific set of
services that allow identifier verification, harvesting, and access to
the datasets

What is an observatory here ? A ground based institution (but in that
case won't it be better to have a telescope-instrument identifier ?) OR
a satellite OR the OFFICIAL data centre of such satellite data ?

You will find the current list at:
http://vo.ads.harvard.edu/dv/facilities.txt

This seems to rule out "private" datasets (as I defined in my earlier
posting) - which might be good - but what about "catalogue" datasets ?

At least under this authority ID (ADS).

If someone else wants to define another class of identifiers (i.e.,
other than the "ADS/" class), that is fine, but it would probably be
sensible to make sure that the values and useage comply with IVOA
standards (as the ADS ones do) in order to maximize usefulness and
recognition.

what is IVOA ?

International Virtual Observatory Alliance

is this a task for the FITS community (if not maybe we should stop
here, or confine the discussion to few FITS specific items), for some
other IAU body, or for somebody else ?

No, not really, but it deals with a convention involving a FITS
keyword which may have repercussion for future use of this keyword.

ADS/Sa.CXO#obs/ObsId
Points to a particular observation
ADS/Sa.CXO#defset/name
Points to a specifically defined set of observations

Once again these seem to point to something which can be assigned
only by an official data centre.

Yes.

ADS/Sa.CXO#bibcode/bibcode
Points to all information we have for a particular paper

Who is "we" in the above sentence, and what papers should be concerned ?

CDA

Any paper published on a journal indexed by the ADS ?

No, the ones for which we know there is a Chandra link (in this example).

and who is storing the relevant data ? ADS, CDS, data centre, author ?

ADS and us.

Any paper on Chandra assuming that the author sends associated
reduced data to the Chandra data centre

Yes, any paper on Chandra data, but no, not linked to products
produced to the author - only the archived datasets produced by CXC
(where the author started from, presumably).

Any paper published by Chandra data centre staff only ?

Of course, this begs two questions:
- Can two files have the same DS_IDENT value?
- Can one file belong to more than one dataset?

Yes, but what about the case of the results of a paper regarding
the analysis of some particular observational data ?

The original (starting) data will be stored at some data centre,
but the result will in general be privately owned by the authors,
and do not BELONG TO the original dataset, more they STEM OUT OF
the original dataset (parent-child relation)

That's correct.

--
----------------------------------------------------------------------
is a newsreading account used by more persons to
avoid unwanted spam. Any mail returning to this address will be rejected.
Users can disclose their e-mail address in the article if they wish so.

_______________________________________________
fitsbits mailing list

http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits

--------------------------------------------------------------------------
Arnold H. Rots Chandra X-ray Science Center
Smithsonian Astrophysical Observatory tel: +1 617 496 7701
60 Garden Street, MS 67 fax: +1 617 495 7356
Cambridge, MA 02138
USA http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------

March 23rd 04, 08:14 PM

Arnold Rots

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

March 23rd 04, 08:51 PM

Rob Seaman

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

Interesting discussion. It was not previously obvious how specific
a concept was being discussed. Thanks to Arnold Rots for the details:

The proposal is to introduce the DS_IDENT keyword as a convention for
dataset identifiers and to define one particular set of values for
this keyword - the ones under the autority of the ADS, i.e.,
identifier values starting with "ADS/". Anybody who wants to
participate in the use of this convention is free to do so, but will
have to comply with the the rules of that convention,

Wouldn't it make more sense to reserve a keyword called "ADSIDENT"?
That is the precise naming space that is being discussed. If you stick
with DS_IDENT (or DS_IDn), you are implicitly assuming that a single
dataset (and the individual files that comprise that dataset) will
never benefit from being named by multiple certification entities
simultaneously.

Of course, this begs two questions:

It begs more than two :-)

- Can two files have the same DS_IDENT value?
The answer should be yes, since a dataset may consist
of more than one file.

The answer *must* be yes, because this possibility cannot be legislated
out of existence.

- Can one file belong to more than one dataset?
The answer is again yes. This may mean that we should
allow for DS_IDn keywords.

A file can belong to more than one ADS-style dataset. A file can also
belong to more than one entirely distinct name space.

Suggestions:

1) Reserve ADSID and ADSIDn for the purposes of the proposal being
discussed.

2) Expect to reserve keywords of the form xxxIDn in the future for
similar purposes related to other certifying entities.

If the concept of ADS administered ID name spaces is of use to the
larger astronomical community, this will become obvious as other
data providers sign on to the ADS bandwagon. Meanwhile, it may *also*
be useful for some data providers to form their own ID name spaces.

(I said "files"; you may read "extensions", if you like)

Again - separate IDs *must* be supported for separate extensions.
How are you going to legislate against MEF files containing files (or
other data structures, such as tables) from more than one dataset?

The question has come up in which headers the keyword should appear.
I would recommend putting it in any and all headers where it is
appropriate - primary and secondary.

Agreed!

Hope this helps,

Yes, indeed. Thank you, Arnold.

Rob Seaman
NOAO Science Data Systems

#10

March 23rd 04, 08:51 PM

Rob Seaman

external usenet poster

Posts: n/a

[fitsbits] 'Dataset Identifications' postings (digest)

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
digest 2453183	Frederick Shorts	Astronomy Misc	3	July 1st 04 08:29 PM
[fitsbits] Dataset identifications.	Jonathan McDowell	FITS	3	March 12th 04 03:57 PM
[fitsbits] Dataset identifications.	Thierry Forveille	FITS	12	March 12th 04 02:33 PM
[fitsbits] Dataset identifications.	Thomas McGlynn	FITS	0	March 10th 04 07:20 PM
antagonist's digest, volume 2452854	dizzy	Astronomy Misc	4	August 7th 03 01:02 AM