|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
On 16 Mar 2004, Don Wells wrote a digest :
From: Thomas McGlynn There is an effort underway at several of the NASA archives to provide a standard dataset identifier for data that can be retrieved from the archives. The initial motivation is that when authors publish [...] motivation understood and agreed The keyword 'DS_IDENT' has been suggested. Does anyone have objections to this or do they know of systems that already use this keyword? I believe this or any other unused name is fine ------------------------------------------------------------------------ From: (Rob Seaman) NOAO (through "Save the bits") has three or four million discrete FITS images packaged up into MEF files for purposes of efficient and easy handling. On the other hand, HEASARC's usage supplies an example involving one dataset that contains several files. would the former be "one file 'originally from' many datasets but now actually a new dataset on its own" ? While the latter seems more familiar to me. But I can imagine another case, i.e. data retrieved from a site with a database and containing part of a catalog. Personally, I think before we reserve "DS_IDENT" or any other keyword for the purpose of identifying datasets, we should define the concept of a "dataset". Yes I think so. Let me say what is my *understanding* of a "dataset" (which does not mean it's something I propose as THE definition !) based on some past experiences. In the case of an X-ray satellite, typically one has a unit like an observing proposal [A], which includes one or more pointings. The pointing [b] occurs in a given time interval, and may involve SIMULTANEOUS observations by more than one instrument [C]. For each instrument the overall time may be divided in consecutive time intervals [D] in which a given instrument configuration is used. There may be many different telemetry packet streams generated during each interval [D], roughly speaking many different files ... not even FITS files. At some stage they might be transformed in a group of many different (FITS ?) files, which will be kept together as a dataset. Just to make some examples, for the long forlorn Exosat satellite, the observer was receiving an half-inch tape called a FOT. There was one logical FOT (maybe spanning several volumes) for each [A][b] combination, where [b] was called the Observing Period (OP) and [D] were called "observations". There were many (non-FITS) files for each [C][D] combination, but I would call the FOT itself as "the dataset". I don't remember if they originally had an identifier other than the name of the target and the date. I heard that ESTEC much later had plans to finally re-archive as FITS event lists, however I haven't followed this. For BeppoSAX, I'm the culprit of having forced inheritance of the above naming, with [A][b] being the OP, and [D] observations. BeppoSAX had FOTs (in the form of DAT cassettes with several non-FITS files) and they were identified by the OP (sequential) number. A dataset was definitely "the OP" or "the associated FOT". I would say more "the OP" as ASDC has been archiving for online access also some reprocessed FITS event lists, grouped by OP. For XMM-Newton the naming is different but the concept is similar. Proposals [A] have a numeric prop-id. [b] are called here "observations" and have a 4-digit obs-id. [D] are called "exposures". What they used to give to observers until a while ago was a CD associated to the combination [A][b] ... and in fact the data were labelled with the concatenation of prop-id and obs-id e.g. 0065760201. Now they distribute data online only, but the scheme has ben retained. "The dataset" is the ensemble of all (many!) (FITS) files pertaining to an [A][b]. I note incidentally that, although no tapes are used, the "flat" naming scheme is still used with long horrible file names like P0065760201M1S001EBLSLI0000.FIT. My personal tendency (but I'm an end user and not an archive mantainer in this context) would have been to put part of the information in directory names and not in file names (e.g. for my own BeppoSAX analysis I used to store files as [A']/[B']/[C]/[D].type, and I tend to use shorter names also for my own XMM reduction (while "the dataset" as distributed by ESA contains instead only two directories, one with the semi-raw FITS reformatted data, and the other one with the pipeline products). But that (flat or tree) arrangement leaves unchanged the definition of which files constitute "a dataset". To go back to another old (but simple) example, in the case of the UV satellite IUE, nobody cared about the proposal id [A] or the object [b] when referring to a dataset. The "unit" was one exposure (one spectrum with a given camera = only one camera operative at any time), or "image", which had identifiers [C][D], e.g. SWP11056. The data delivered to the observer was a set of 4-5 files (originally non FITS) for each "image" (one raw image, and the steps and results of a pipeline). In this case I would be inclined to consider this group of files as "the dataset" (irrespective of the fact that more than one, unrelated, could be placed on a tape) I'm not terribly familiar with the way a ground site like ESO manages its archives, but definitely a proposal [A] can refer to many targets [b], and ultimately to units called "OBs" (Observing Blocks) which are split into exposures. Exposures taken at different times may be associated (e.g. for a multi-object spectrograph one can associate the exposure taken with a given mask with the dark or lamp calibration taken later with the same mask), so it's this association I'd call "the dataset". In any case, I've been talking so far of raw, semi-raw or standard-reduced data archived at the original observatory (or other site in charge of archiving) pertaining to a pointing of an object at a given time. More to come below ... ------------------------------------------------------------------------ From: Jonathan McDowell suppose I have run a modelling tool to get the best deconvolved image fit simultaneously to ROSAT and CHandra data, and stored the result in the FITS file. [...] However, I would say to Thierry that the new file should indeed have a brand new dataset identifier - you have in this case created a new dataset. The traceability to the original observations should be done This is indeed a new case. In general I'm inclined to consider the result of any analysis (as opposed to plain "reduction") to be "private" data. One may keep them, but privately. What matters are the numbers in the published paper. But there might be cases indeed in which such data could be stored and made publicly available (forever ?) although not in a mission archive. OK, they are "a new dataset" but who names them ? Are we going to run into things like "official naming authorities", like the awful "certificates" and "self signed certificates" stuff ? Should we just delegate it to the journals and/or use the bibcode (somebody said something like that) ? There is at least one other different case, databases and catalogues. E.g. I'm managing the database for the XMM-LSS survey (which is a survey done *with* XMM by a consortium using some GO time, but not *by* the XMM ESA project staff, hence "unofficial"). Our collaboration members (and later the public) can export catalogue subsets as FITS files. So far I've not worried about "dataset identification". Of course each RECORD in one of my tables which refer to the XMM data is associated to an XMM pointing (and its propid-obsid), but I'm not keeping this info explicit. And there are other tables containing non X-ray data taken by us (with an optical telescope or with the VLA). There are tables which are authorized subsets of data taken by other consortia. There are tables which are pointers to NED or SIMBAD. Should I really worry here about traceability ? Or just say that the dataset is the XMM-LSS project (an ORIGIN keyword would be enough !) ? ---------------------------------------------------------------------------- Lucio Chiappetti - IASF/CNR - via Bassini 15 - I-20133 Milano (Italy) ---------------------------------------------------------------------------- L'Italia ripudia la guerra [...] come Italy repudiates war {...] as a mezzo di risoluzione delle controversie way of resolution of international internazionali controversies [Art. 11 Constitution of the Italian Republic] ---------------------------------------------------------------------------- For more info : http://www.mi.iasf.cnr.it/~lucio/personal.html ---------------------------------------------------------------------------- |
#2
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
On 16 Mar 2004, Don Wells wrote a digest :
From: Thomas McGlynn There is an effort underway at several of the NASA archives to provide a standard dataset identifier for data that can be retrieved from the archives. The initial motivation is that when authors publish [...] motivation understood and agreed The keyword 'DS_IDENT' has been suggested. Does anyone have objections to this or do they know of systems that already use this keyword? I believe this or any other unused name is fine ------------------------------------------------------------------------ From: (Rob Seaman) NOAO (through "Save the bits") has three or four million discrete FITS images packaged up into MEF files for purposes of efficient and easy handling. On the other hand, HEASARC's usage supplies an example involving one dataset that contains several files. would the former be "one file 'originally from' many datasets but now actually a new dataset on its own" ? While the latter seems more familiar to me. But I can imagine another case, i.e. data retrieved from a site with a database and containing part of a catalog. Personally, I think before we reserve "DS_IDENT" or any other keyword for the purpose of identifying datasets, we should define the concept of a "dataset". Yes I think so. Let me say what is my *understanding* of a "dataset" (which does not mean it's something I propose as THE definition !) based on some past experiences. In the case of an X-ray satellite, typically one has a unit like an observing proposal [A], which includes one or more pointings. The pointing [b] occurs in a given time interval, and may involve SIMULTANEOUS observations by more than one instrument [C]. For each instrument the overall time may be divided in consecutive time intervals [D] in which a given instrument configuration is used. There may be many different telemetry packet streams generated during each interval [D], roughly speaking many different files ... not even FITS files. At some stage they might be transformed in a group of many different (FITS ?) files, which will be kept together as a dataset. Just to make some examples, for the long forlorn Exosat satellite, the observer was receiving an half-inch tape called a FOT. There was one logical FOT (maybe spanning several volumes) for each [A][b] combination, where [b] was called the Observing Period (OP) and [D] were called "observations". There were many (non-FITS) files for each [C][D] combination, but I would call the FOT itself as "the dataset". I don't remember if they originally had an identifier other than the name of the target and the date. I heard that ESTEC much later had plans to finally re-archive as FITS event lists, however I haven't followed this. For BeppoSAX, I'm the culprit of having forced inheritance of the above naming, with [A][b] being the OP, and [D] observations. BeppoSAX had FOTs (in the form of DAT cassettes with several non-FITS files) and they were identified by the OP (sequential) number. A dataset was definitely "the OP" or "the associated FOT". I would say more "the OP" as ASDC has been archiving for online access also some reprocessed FITS event lists, grouped by OP. For XMM-Newton the naming is different but the concept is similar. Proposals [A] have a numeric prop-id. [b] are called here "observations" and have a 4-digit obs-id. [D] are called "exposures". What they used to give to observers until a while ago was a CD associated to the combination [A][b] ... and in fact the data were labelled with the concatenation of prop-id and obs-id e.g. 0065760201. Now they distribute data online only, but the scheme has ben retained. "The dataset" is the ensemble of all (many!) (FITS) files pertaining to an [A][b]. I note incidentally that, although no tapes are used, the "flat" naming scheme is still used with long horrible file names like P0065760201M1S001EBLSLI0000.FIT. My personal tendency (but I'm an end user and not an archive mantainer in this context) would have been to put part of the information in directory names and not in file names (e.g. for my own BeppoSAX analysis I used to store files as [A']/[B']/[C]/[D].type, and I tend to use shorter names also for my own XMM reduction (while "the dataset" as distributed by ESA contains instead only two directories, one with the semi-raw FITS reformatted data, and the other one with the pipeline products). But that (flat or tree) arrangement leaves unchanged the definition of which files constitute "a dataset". To go back to another old (but simple) example, in the case of the UV satellite IUE, nobody cared about the proposal id [A] or the object [b] when referring to a dataset. The "unit" was one exposure (one spectrum with a given camera = only one camera operative at any time), or "image", which had identifiers [C][D], e.g. SWP11056. The data delivered to the observer was a set of 4-5 files (originally non FITS) for each "image" (one raw image, and the steps and results of a pipeline). In this case I would be inclined to consider this group of files as "the dataset" (irrespective of the fact that more than one, unrelated, could be placed on a tape) I'm not terribly familiar with the way a ground site like ESO manages its archives, but definitely a proposal [A] can refer to many targets [b], and ultimately to units called "OBs" (Observing Blocks) which are split into exposures. Exposures taken at different times may be associated (e.g. for a multi-object spectrograph one can associate the exposure taken with a given mask with the dark or lamp calibration taken later with the same mask), so it's this association I'd call "the dataset". In any case, I've been talking so far of raw, semi-raw or standard-reduced data archived at the original observatory (or other site in charge of archiving) pertaining to a pointing of an object at a given time. More to come below ... ------------------------------------------------------------------------ From: Jonathan McDowell suppose I have run a modelling tool to get the best deconvolved image fit simultaneously to ROSAT and CHandra data, and stored the result in the FITS file. [...] However, I would say to Thierry that the new file should indeed have a brand new dataset identifier - you have in this case created a new dataset. The traceability to the original observations should be done This is indeed a new case. In general I'm inclined to consider the result of any analysis (as opposed to plain "reduction") to be "private" data. One may keep them, but privately. What matters are the numbers in the published paper. But there might be cases indeed in which such data could be stored and made publicly available (forever ?) although not in a mission archive. OK, they are "a new dataset" but who names them ? Are we going to run into things like "official naming authorities", like the awful "certificates" and "self signed certificates" stuff ? Should we just delegate it to the journals and/or use the bibcode (somebody said something like that) ? There is at least one other different case, databases and catalogues. E.g. I'm managing the database for the XMM-LSS survey (which is a survey done *with* XMM by a consortium using some GO time, but not *by* the XMM ESA project staff, hence "unofficial"). Our collaboration members (and later the public) can export catalogue subsets as FITS files. So far I've not worried about "dataset identification". Of course each RECORD in one of my tables which refer to the XMM data is associated to an XMM pointing (and its propid-obsid), but I'm not keeping this info explicit. And there are other tables containing non X-ray data taken by us (with an optical telescope or with the VLA). There are tables which are authorized subsets of data taken by other consortia. There are tables which are pointers to NED or SIMBAD. Should I really worry here about traceability ? Or just say that the dataset is the XMM-LSS project (an ORIGIN keyword would be enough !) ? ---------------------------------------------------------------------------- Lucio Chiappetti - IASF/CNR - via Bassini 15 - I-20133 Milano (Italy) ---------------------------------------------------------------------------- L'Italia ripudia la guerra [...] come Italy repudiates war {...] as a mezzo di risoluzione delle controversie way of resolution of international internazionali controversies [Art. 11 Constitution of the Italian Republic] ---------------------------------------------------------------------------- For more info : http://www.mi.iasf.cnr.it/~lucio/personal.html ---------------------------------------------------------------------------- |
#3
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
It may be good to clarify the context and scope of what Tom is
proposing (at least my take on it; I won't claim to speak for Tom). The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., identifier values starting with "ADS/". Anybody who wants to participate in the use of this convention is free to do so, but will have to comply with the the rules of that convention, which a 1. the identifier is of the form "ADS/observatory#dataset" 2. observatory must be taken from the list maintained by the ADS 3. dataset values are controlled by the data center or observatory that bears responsibility for the observatory archive 4. that controlling authority, and its successors and assigns, must guarantee access to dataset in perpetuity 5. the keeper of the observatory data will provide a specific set of services that allow identifier verification, harvesting, and access to the datasets If someone else wants to define another class of identifiers (i.e., other than the "ADS/" class), that is fine, but it would probably be sensible to make sure that the values and useage comply with IVOA standards (as the ADS ones do) in order to maximize usefulness and recognition. I can tell what they, most likely, will look like for Chandra. There will be (at least) three groups: ADS/Sa.CXO#obs/ObsId Points to a particular observation ADS/Sa.CXO#defset/name Points to a specifically defined set of observations ADS/Sa.CXO#bibcode/bibcode Points to all information we have for a particular paper Of course, this begs two questions: - Can two files have the same DS_IDENT value? The answer should be yes, since a dataset may consist of more than one file. - Can one file belong to more than one dataset? The answer is again yes. This may mean that we should allow for DS_IDn keywords. (I said "files"; you may read "extensions", if you like) The question has come up in which headers the keyword should appear. I would recommend putting it in any and all headers where it is appropriate - primary and secondary. Hope this helps, - Arnold Don Wells wrote: ... From: Thomas McGlynn Subject: [fitsbits] Dataset identifications. Newsgroups: sci.astro.fits Date: Wed, 10 Mar 2004 14:20:18 -0500 Organization: NASA Goddard Space Flight Center Reply-To: There is an effort underway at several of the NASA archives to provide a standard dataset identifier for data that can be retrieved from the archives. The initial motivation is that when authors publish a paper they will be able to specify the data that was used in analysis and systems like the ADS will be able to provide links to these data in a systematic way from the papers (and vice versa for the archives). Currently this is done for a few datasets but it's a very manual and labor intensive process. Although the initial impetus is coming from some of the NASA sites, we've been talking with the VO efforts and hope that the ID will be of general utility. I've no doubt that if ID's become established they will be used in many different ways. There are discussions still ongoing as to the exact format to be used. It is intended that the overall format will be compatible with the identification standards that are being discussed in the Virtual Observatory world. An example ID might be ADS/Sa.ROSAT#X/rh701576n00 where the ADS indicates the the ADS will provide the high level resolution service, the 'Sa.ROSAT' is an observatory identifier, and the element that follows the # is observatory specific, but should be familiar enough for those who have used ROSAT data. The question for this group is not so much a discussion of the format of the ID. Rather it was pointed out that if these IDs are successful it would be useful to be able to have a standard FITS keyword that would indicate the dataset id that the current file belongs to. The keyword 'DS_IDENT' has been suggested. Does anyone have objections to this or do they know of systems that already use this keyword? Googling DS_IDENT returns an album of Donna Summer's but no FITS references. Also, are there any issues the we need to resolve regarding the usage of the keyword? One that comes to mind is whether use of this keyword should be recommended only for the primary header of a FITS file. If not then a file may not be associated with a unique dataset id. I'd appreciate any comments, questions or thoughts on the subject. Thanks, Tom McGlynn HEASARC ... -- Donald C. Wells Scientist http://www.cv.nrao.edu/~dwells National Radio Astronomy Observatory +1-434-296-0277 520 Edgemont Road, Charlottesville, Virginia 22903-2475 USA _______________________________________________ fitsbits mailing list http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
#4
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
It may be good to clarify the context and scope of what Tom is
proposing (at least my take on it; I won't claim to speak for Tom). The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., identifier values starting with "ADS/". Anybody who wants to participate in the use of this convention is free to do so, but will have to comply with the the rules of that convention, which a 1. the identifier is of the form "ADS/observatory#dataset" 2. observatory must be taken from the list maintained by the ADS 3. dataset values are controlled by the data center or observatory that bears responsibility for the observatory archive 4. that controlling authority, and its successors and assigns, must guarantee access to dataset in perpetuity 5. the keeper of the observatory data will provide a specific set of services that allow identifier verification, harvesting, and access to the datasets If someone else wants to define another class of identifiers (i.e., other than the "ADS/" class), that is fine, but it would probably be sensible to make sure that the values and useage comply with IVOA standards (as the ADS ones do) in order to maximize usefulness and recognition. I can tell what they, most likely, will look like for Chandra. There will be (at least) three groups: ADS/Sa.CXO#obs/ObsId Points to a particular observation ADS/Sa.CXO#defset/name Points to a specifically defined set of observations ADS/Sa.CXO#bibcode/bibcode Points to all information we have for a particular paper Of course, this begs two questions: - Can two files have the same DS_IDENT value? The answer should be yes, since a dataset may consist of more than one file. - Can one file belong to more than one dataset? The answer is again yes. This may mean that we should allow for DS_IDn keywords. (I said "files"; you may read "extensions", if you like) The question has come up in which headers the keyword should appear. I would recommend putting it in any and all headers where it is appropriate - primary and secondary. Hope this helps, - Arnold Don Wells wrote: ... From: Thomas McGlynn Subject: [fitsbits] Dataset identifications. Newsgroups: sci.astro.fits Date: Wed, 10 Mar 2004 14:20:18 -0500 Organization: NASA Goddard Space Flight Center Reply-To: There is an effort underway at several of the NASA archives to provide a standard dataset identifier for data that can be retrieved from the archives. The initial motivation is that when authors publish a paper they will be able to specify the data that was used in analysis and systems like the ADS will be able to provide links to these data in a systematic way from the papers (and vice versa for the archives). Currently this is done for a few datasets but it's a very manual and labor intensive process. Although the initial impetus is coming from some of the NASA sites, we've been talking with the VO efforts and hope that the ID will be of general utility. I've no doubt that if ID's become established they will be used in many different ways. There are discussions still ongoing as to the exact format to be used. It is intended that the overall format will be compatible with the identification standards that are being discussed in the Virtual Observatory world. An example ID might be ADS/Sa.ROSAT#X/rh701576n00 where the ADS indicates the the ADS will provide the high level resolution service, the 'Sa.ROSAT' is an observatory identifier, and the element that follows the # is observatory specific, but should be familiar enough for those who have used ROSAT data. The question for this group is not so much a discussion of the format of the ID. Rather it was pointed out that if these IDs are successful it would be useful to be able to have a standard FITS keyword that would indicate the dataset id that the current file belongs to. The keyword 'DS_IDENT' has been suggested. Does anyone have objections to this or do they know of systems that already use this keyword? Googling DS_IDENT returns an album of Donna Summer's but no FITS references. Also, are there any issues the we need to resolve regarding the usage of the keyword? One that comes to mind is whether use of this keyword should be recommended only for the primary header of a FITS file. If not then a file may not be associated with a unique dataset id. I'd appreciate any comments, questions or thoughts on the subject. Thanks, Tom McGlynn HEASARC ... -- Donald C. Wells Scientist http://www.cv.nrao.edu/~dwells National Radio Astronomy Observatory +1-434-296-0277 520 Edgemont Road, Charlottesville, Virginia 22903-2475 USA _______________________________________________ fitsbits mailing list http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
#5
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
On Tue, 23 Mar 2004, Arnold Rots wrote:
The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., what does it make them "under the authority of the ADS" ? A specific agreement between ADS and Observatory archive and/or paper author and/or journal and/or IAU ? 2. observatory must be taken from the list maintained by the ADS 5. the keeper of the observatory data will provide a specific set of services that allow identifier verification, harvesting, and access to the datasets What is an observatory here ? A ground based institution (but in that case won't it be better to have a telescope-instrument identifier ?) OR a satellite OR the OFFICIAL data centre of such satellite data ? This seems to rule out "private" datasets (as I defined in my earlier posting) - which might be good - but what about "catalogue" datasets ? If someone else wants to define another class of identifiers (i.e., other than the "ADS/" class), that is fine, but it would probably be sensible to make sure that the values and useage comply with IVOA standards (as the ADS ones do) in order to maximize usefulness and recognition. what is IVOA ? is this a task for the FITS community (if not maybe we should stop here, or confine the discussion to few FITS specific items), for some other IAU body, or for somebody else ? ADS/Sa.CXO#obs/ObsId Points to a particular observation ADS/Sa.CXO#defset/name Points to a specifically defined set of observations Once again these seem to point to something which can be assigned only by an official data centre. ADS/Sa.CXO#bibcode/bibcode Points to all information we have for a particular paper Who is "we" in the above sentence, and what papers should be concerned ? Any paper published on a journal indexed by the ADS ? and who is storing the relevant data ? ADS, CDS, data centre, author ? Any paper on Chandra assuming that the author sends associated reduced data to the Chandra data centre Any paper published by Chandra data centre staff only ? Of course, this begs two questions: - Can two files have the same DS_IDENT value? - Can one file belong to more than one dataset? Yes, but what about the case of the results of a paper regarding the analysis of some particular observational data ? The original (starting) data will be stored at some data centre, but the result will in general be privately owned by the authors, and do not BELONG TO the original dataset, more they STEM OUT OF the original dataset (parent-child relation) -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. |
#6
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
On Tue, 23 Mar 2004, Arnold Rots wrote:
The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., what does it make them "under the authority of the ADS" ? A specific agreement between ADS and Observatory archive and/or paper author and/or journal and/or IAU ? 2. observatory must be taken from the list maintained by the ADS 5. the keeper of the observatory data will provide a specific set of services that allow identifier verification, harvesting, and access to the datasets What is an observatory here ? A ground based institution (but in that case won't it be better to have a telescope-instrument identifier ?) OR a satellite OR the OFFICIAL data centre of such satellite data ? This seems to rule out "private" datasets (as I defined in my earlier posting) - which might be good - but what about "catalogue" datasets ? If someone else wants to define another class of identifiers (i.e., other than the "ADS/" class), that is fine, but it would probably be sensible to make sure that the values and useage comply with IVOA standards (as the ADS ones do) in order to maximize usefulness and recognition. what is IVOA ? is this a task for the FITS community (if not maybe we should stop here, or confine the discussion to few FITS specific items), for some other IAU body, or for somebody else ? ADS/Sa.CXO#obs/ObsId Points to a particular observation ADS/Sa.CXO#defset/name Points to a specifically defined set of observations Once again these seem to point to something which can be assigned only by an official data centre. ADS/Sa.CXO#bibcode/bibcode Points to all information we have for a particular paper Who is "we" in the above sentence, and what papers should be concerned ? Any paper published on a journal indexed by the ADS ? and who is storing the relevant data ? ADS, CDS, data centre, author ? Any paper on Chandra assuming that the author sends associated reduced data to the Chandra data centre Any paper published by Chandra data centre staff only ? Of course, this begs two questions: - Can two files have the same DS_IDENT value? - Can one file belong to more than one dataset? Yes, but what about the case of the results of a paper regarding the analysis of some particular observational data ? The original (starting) data will be stored at some data centre, but the result will in general be privately owned by the authors, and do not BELONG TO the original dataset, more they STEM OUT OF the original dataset (parent-child relation) -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. |
#7
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Maybe it helps to state the practical purpose of the identifiers.
It's put in there to inform users as to what dataset identifier to use if and when they insert such identifiers into their manuscripts. The purpose of that is to facilitate the linkage between the literature and the archived datasets. Those links are currently being maintained by a number of data centers (and the ADS) but it is rather labor-intensive. This mechanism would allow for automatic harvesting. More responses below. - Arnold LC's No-Spam Newsreading account wrote: On Tue, 23 Mar 2004, Arnold Rots wrote: The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., what does it make them "under the authority of the ADS" ? A specific agreement between ADS and Observatory archive and/or paper author and/or journal and/or IAU ? The fact that they start with "ADS/". It is indeed tied in with an agreement between ADS, data centers, journals, aimed at enabling ADS and data centers to harvest literature-dataset links. 2. observatory must be taken from the list maintained by the ADS 5. the keeper of the observatory data will provide a specific set of services that allow identifier verification, harvesting, and access to the datasets What is an observatory here ? A ground based institution (but in that case won't it be better to have a telescope-instrument identifier ?) OR a satellite OR the OFFICIAL data centre of such satellite data ? You will find the current list at: http://vo.ads.harvard.edu/dv/facilities.txt This seems to rule out "private" datasets (as I defined in my earlier posting) - which might be good - but what about "catalogue" datasets ? At least under this authority ID (ADS). If someone else wants to define another class of identifiers (i.e., other than the "ADS/" class), that is fine, but it would probably be sensible to make sure that the values and useage comply with IVOA standards (as the ADS ones do) in order to maximize usefulness and recognition. what is IVOA ? International Virtual Observatory Alliance is this a task for the FITS community (if not maybe we should stop here, or confine the discussion to few FITS specific items), for some other IAU body, or for somebody else ? No, not really, but it deals with a convention involving a FITS keyword which may have repercussion for future use of this keyword. ADS/Sa.CXO#obs/ObsId Points to a particular observation ADS/Sa.CXO#defset/name Points to a specifically defined set of observations Once again these seem to point to something which can be assigned only by an official data centre. Yes. ADS/Sa.CXO#bibcode/bibcode Points to all information we have for a particular paper Who is "we" in the above sentence, and what papers should be concerned ? CDA Any paper published on a journal indexed by the ADS ? No, the ones for which we know there is a Chandra link (in this example). and who is storing the relevant data ? ADS, CDS, data centre, author ? ADS and us. Any paper on Chandra assuming that the author sends associated reduced data to the Chandra data centre Yes, any paper on Chandra data, but no, not linked to products produced to the author - only the archived datasets produced by CXC (where the author started from, presumably). Any paper published by Chandra data centre staff only ? Of course, this begs two questions: - Can two files have the same DS_IDENT value? - Can one file belong to more than one dataset? Yes, but what about the case of the results of a paper regarding the analysis of some particular observational data ? The original (starting) data will be stored at some data centre, but the result will in general be privately owned by the authors, and do not BELONG TO the original dataset, more they STEM OUT OF the original dataset (parent-child relation) That's correct. -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. _______________________________________________ fitsbits mailing list http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
#8
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Maybe it helps to state the practical purpose of the identifiers.
It's put in there to inform users as to what dataset identifier to use if and when they insert such identifiers into their manuscripts. The purpose of that is to facilitate the linkage between the literature and the archived datasets. Those links are currently being maintained by a number of data centers (and the ADS) but it is rather labor-intensive. This mechanism would allow for automatic harvesting. More responses below. - Arnold LC's No-Spam Newsreading account wrote: On Tue, 23 Mar 2004, Arnold Rots wrote: The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., what does it make them "under the authority of the ADS" ? A specific agreement between ADS and Observatory archive and/or paper author and/or journal and/or IAU ? The fact that they start with "ADS/". It is indeed tied in with an agreement between ADS, data centers, journals, aimed at enabling ADS and data centers to harvest literature-dataset links. 2. observatory must be taken from the list maintained by the ADS 5. the keeper of the observatory data will provide a specific set of services that allow identifier verification, harvesting, and access to the datasets What is an observatory here ? A ground based institution (but in that case won't it be better to have a telescope-instrument identifier ?) OR a satellite OR the OFFICIAL data centre of such satellite data ? You will find the current list at: http://vo.ads.harvard.edu/dv/facilities.txt This seems to rule out "private" datasets (as I defined in my earlier posting) - which might be good - but what about "catalogue" datasets ? At least under this authority ID (ADS). If someone else wants to define another class of identifiers (i.e., other than the "ADS/" class), that is fine, but it would probably be sensible to make sure that the values and useage comply with IVOA standards (as the ADS ones do) in order to maximize usefulness and recognition. what is IVOA ? International Virtual Observatory Alliance is this a task for the FITS community (if not maybe we should stop here, or confine the discussion to few FITS specific items), for some other IAU body, or for somebody else ? No, not really, but it deals with a convention involving a FITS keyword which may have repercussion for future use of this keyword. ADS/Sa.CXO#obs/ObsId Points to a particular observation ADS/Sa.CXO#defset/name Points to a specifically defined set of observations Once again these seem to point to something which can be assigned only by an official data centre. Yes. ADS/Sa.CXO#bibcode/bibcode Points to all information we have for a particular paper Who is "we" in the above sentence, and what papers should be concerned ? CDA Any paper published on a journal indexed by the ADS ? No, the ones for which we know there is a Chandra link (in this example). and who is storing the relevant data ? ADS, CDS, data centre, author ? ADS and us. Any paper on Chandra assuming that the author sends associated reduced data to the Chandra data centre Yes, any paper on Chandra data, but no, not linked to products produced to the author - only the archived datasets produced by CXC (where the author started from, presumably). Any paper published by Chandra data centre staff only ? Of course, this begs two questions: - Can two files have the same DS_IDENT value? - Can one file belong to more than one dataset? Yes, but what about the case of the results of a paper regarding the analysis of some particular observational data ? The original (starting) data will be stored at some data centre, but the result will in general be privately owned by the authors, and do not BELONG TO the original dataset, more they STEM OUT OF the original dataset (parent-child relation) That's correct. -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. _______________________________________________ fitsbits mailing list http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
#9
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Interesting discussion. It was not previously obvious how specific
a concept was being discussed. Thanks to Arnold Rots for the details: The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., identifier values starting with "ADS/". Anybody who wants to participate in the use of this convention is free to do so, but will have to comply with the the rules of that convention, Wouldn't it make more sense to reserve a keyword called "ADSIDENT"? That is the precise naming space that is being discussed. If you stick with DS_IDENT (or DS_IDn), you are implicitly assuming that a single dataset (and the individual files that comprise that dataset) will never benefit from being named by multiple certification entities simultaneously. Of course, this begs two questions: It begs more than two :-) - Can two files have the same DS_IDENT value? The answer should be yes, since a dataset may consist of more than one file. The answer *must* be yes, because this possibility cannot be legislated out of existence. - Can one file belong to more than one dataset? The answer is again yes. This may mean that we should allow for DS_IDn keywords. A file can belong to more than one ADS-style dataset. A file can also belong to more than one entirely distinct name space. Suggestions: 1) Reserve ADSID and ADSIDn for the purposes of the proposal being discussed. 2) Expect to reserve keywords of the form xxxIDn in the future for similar purposes related to other certifying entities. If the concept of ADS administered ID name spaces is of use to the larger astronomical community, this will become obvious as other data providers sign on to the ADS bandwagon. Meanwhile, it may *also* be useful for some data providers to form their own ID name spaces. (I said "files"; you may read "extensions", if you like) Again - separate IDs *must* be supported for separate extensions. How are you going to legislate against MEF files containing files (or other data structures, such as tables) from more than one dataset? The question has come up in which headers the keyword should appear. I would recommend putting it in any and all headers where it is appropriate - primary and secondary. Agreed! Hope this helps, Yes, indeed. Thank you, Arnold. Rob Seaman NOAO Science Data Systems |
#10
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Interesting discussion. It was not previously obvious how specific
a concept was being discussed. Thanks to Arnold Rots for the details: The proposal is to introduce the DS_IDENT keyword as a convention for dataset identifiers and to define one particular set of values for this keyword - the ones under the autority of the ADS, i.e., identifier values starting with "ADS/". Anybody who wants to participate in the use of this convention is free to do so, but will have to comply with the the rules of that convention, Wouldn't it make more sense to reserve a keyword called "ADSIDENT"? That is the precise naming space that is being discussed. If you stick with DS_IDENT (or DS_IDn), you are implicitly assuming that a single dataset (and the individual files that comprise that dataset) will never benefit from being named by multiple certification entities simultaneously. Of course, this begs two questions: It begs more than two :-) - Can two files have the same DS_IDENT value? The answer should be yes, since a dataset may consist of more than one file. The answer *must* be yes, because this possibility cannot be legislated out of existence. - Can one file belong to more than one dataset? The answer is again yes. This may mean that we should allow for DS_IDn keywords. A file can belong to more than one ADS-style dataset. A file can also belong to more than one entirely distinct name space. Suggestions: 1) Reserve ADSID and ADSIDn for the purposes of the proposal being discussed. 2) Expect to reserve keywords of the form xxxIDn in the future for similar purposes related to other certifying entities. If the concept of ADS administered ID name spaces is of use to the larger astronomical community, this will become obvious as other data providers sign on to the ADS bandwagon. Meanwhile, it may *also* be useful for some data providers to form their own ID name spaces. (I said "files"; you may read "extensions", if you like) Again - separate IDs *must* be supported for separate extensions. How are you going to legislate against MEF files containing files (or other data structures, such as tables) from more than one dataset? The question has come up in which headers the keyword should appear. I would recommend putting it in any and all headers where it is appropriate - primary and secondary. Agreed! Hope this helps, Yes, indeed. Thank you, Arnold. Rob Seaman NOAO Science Data Systems |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
digest 2453183 | Frederick Shorts | Astronomy Misc | 3 | July 1st 04 08:29 PM |
[fitsbits] Dataset identifications. | Jonathan McDowell | FITS | 3 | March 12th 04 03:57 PM |
[fitsbits] Dataset identifications. | Thierry Forveille | FITS | 12 | March 12th 04 02:33 PM |
[fitsbits] Dataset identifications. | Thomas McGlynn | FITS | 0 | March 10th 04 07:20 PM |
antagonist's digest, volume 2452854 | dizzy | Astronomy Misc | 4 | August 7th 03 01:02 AM |