|
|
Thread Tools | Display Modes |
#21
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
The scope of Tom's proposal is really quite limited:
He is announcing the establishment of a convention that employs a keyword (DS_IDENT) or set of keywords (DS_IDiii). The intent is that the value of that keyword contains a label or key that will allow users to obtain a pointer to a particular volume in astronomical data space. No less, but also no more. Within the space of data identifier strings only the subspace of strings starting with "ADS/" (case-insensitive!) is reserved. That's really all; you can stop reading here. But if the subject fascinates you, you may read on. Anybody who wants to participate in that subspace needs to know a little more (like the substring between the first '/' and the first '#' that represents the facility from which the datasets originated, and the fact that that facility is free in choosing its definition of what a dataset is and the encoding of everything after the first '#'), but that is not particularly relevant for this newsgroup. "volume in data space" or "dataset" is left vague because it is up to the issuing facility to decide what makes the most sense for its users and purposes. For the Chandra Data Archive what you will get in response to the key is a URL that will allow you to request a download of data products associated with a particular observation - or maybe a set of observations. If you try again next month, the files may be different: we may have reprocessed or decided to add some products to the package. For other archives it may be a specific file. For the ADS itself, you may think of an OID as the label and a journal article as the dataset. There is no intent to prescribe the syntax or the semantics of the identifiers. And there certainly is no intent to imply any kind of inheritance or propagation of identifiers to the user level. Again, think of the dataset identifier as a key that allows the user to obtain a pointer to the dataset. There is no need to encode any information in it - nor is that prohibited (the ADS could use bibcodes as identifiers). The list of informational metadata that Rob provided looks to me more like metadata that ought to reside in a database. And if you wanted to know what all the values are, you would take the NOAO dataset key and query: select * from RobsDatabase where DS_IDENT='NOAOidentifier'; where NOAOidentifier is something short and, possibly, random, rather than trying to decode information from a string that stretches over many header rows. Again in Chandra archive language, if you want to browse that kind of information, you come to our web browser that will query the database that contains the observation catalog; it will tell you about objects, coordinates, times, observers, instruments, proprietary times, public release dates, etc. - Arnold -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
#22
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
The scope of Tom's proposal is really quite limited:
He is announcing the establishment of a convention that employs a keyword (DS_IDENT) or set of keywords (DS_IDiii). The intent is that the value of that keyword contains a label or key that will allow users to obtain a pointer to a particular volume in astronomical data space. No less, but also no more. Within the space of data identifier strings only the subspace of strings starting with "ADS/" (case-insensitive!) is reserved. That's really all; you can stop reading here. But if the subject fascinates you, you may read on. Anybody who wants to participate in that subspace needs to know a little more (like the substring between the first '/' and the first '#' that represents the facility from which the datasets originated, and the fact that that facility is free in choosing its definition of what a dataset is and the encoding of everything after the first '#'), but that is not particularly relevant for this newsgroup. "volume in data space" or "dataset" is left vague because it is up to the issuing facility to decide what makes the most sense for its users and purposes. For the Chandra Data Archive what you will get in response to the key is a URL that will allow you to request a download of data products associated with a particular observation - or maybe a set of observations. If you try again next month, the files may be different: we may have reprocessed or decided to add some products to the package. For other archives it may be a specific file. For the ADS itself, you may think of an OID as the label and a journal article as the dataset. There is no intent to prescribe the syntax or the semantics of the identifiers. And there certainly is no intent to imply any kind of inheritance or propagation of identifiers to the user level. Again, think of the dataset identifier as a key that allows the user to obtain a pointer to the dataset. There is no need to encode any information in it - nor is that prohibited (the ADS could use bibcodes as identifiers). The list of informational metadata that Rob provided looks to me more like metadata that ought to reside in a database. And if you wanted to know what all the values are, you would take the NOAO dataset key and query: select * from RobsDatabase where DS_IDENT='NOAOidentifier'; where NOAOidentifier is something short and, possibly, random, rather than trying to decode information from a string that stretches over many header rows. Again in Chandra archive language, if you want to browse that kind of information, you come to our web browser that will query the database that contains the observation catalog; it will tell you about objects, coordinates, times, observers, instruments, proprietary times, public release dates, etc. - Arnold -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
#23
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
On Wed, 24 Mar 2004, Arnold Rots wrote:
The scope of Tom's proposal is really quite limited: I would like to support Tom's simple and basic proposal. As he says, a unique key is of great value in building and searching databases, and there are a whole lot of cases in which the DS_IDENT will be immediately usable and useful. Of course we can all think of special cases in which it will not be quite enough, but (in my opinion) this doesn't invalidate the basic idea, which would be severely compromised by being made more complicated. Let's keep it simple. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. |
#24
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
On Wed, 24 Mar 2004, Arnold Rots wrote:
The scope of Tom's proposal is really quite limited: I would like to support Tom's simple and basic proposal. As he says, a unique key is of great value in building and searching databases, and there are a whole lot of cases in which the DS_IDENT will be immediately usable and useful. Of course we can all think of special cases in which it will not be quite enough, but (in my opinion) this doesn't invalidate the basic idea, which would be severely compromised by being made more complicated. Let's keep it simple. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. |
#25
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Let me answer to a bunch of messages in one go.
(By the way, if you've not already guessed, the messages tagged "LC's Nospam ..." are by me as well, it depends if I use fitsbits or post the NG ; in general I use the mailing list for longer messages) From: Rob Seaman Date: Tue, 23 Mar 2004 21:54:40 +0000 (UTC) You will find the current list at: http://vo.ads.harvard.edu/dv/facilities.txt A very interesting list. .... Indeed. There appears to be a confusion between a ground-based observing site and an observatory - perhaps this is a result of the list being compiled by our friends in the space-based astronomical community? Maybe, that's why I found it quite natural for me ... .... although I've some reservations just on the "satellite" subset, namely is a Sa:spacecraft enough to define where the dataset is archived ? Surely yes if there is a single archive managed by a single space agency or its "contractor". Possibly yes if the satellite is a cooperation between different agencies, AND they have agreed to run the same pipeline AND to keep mirror sites May fail if different agencies, organizations or institutes decide to run different pipelines on the same data ! Resulting in two separate datasets stemming from the same raw data. In general an observatory is a political entity, a telescope is a facility, and a site like Kitt Peak is a piece of real estate that may be host multiple facilities from multiple observatories. Depending on the details of contracts or other binding operating agreements, an observatory may "own" the data that result from a particular facility like a telescope, I guess it does not matter at all who owns the data rights for the period during which the data are not public. If the data are to be indexed, it means they are public ... either in some official archive or possibly in some private one. A dataset ID can be a relatively simple beast - perhaps as simple as a data source ID and a serial number. But the full taxonomy of dataset provenance has to support many degrees of freedom. At the very least: Nation Funding agency Just for the sake of argument, a "funding agency" is not necessarily associated with a single nation, at least this side of the Atlantic (ESA, ESO) ... or of the Panama canal (ESO again :-) ). Observatory Consortium member ("partner") The latter is hardly relevant to the identification of the dataset Telescope Instrument these and the above are (loosely) covered by ORIGIN, TELESCOP, INSTRUME, or other keywords which may be in the same FITS file, or (as said by others already) in some database at the archive site Date&Time Proposal ID PI and/or project ID The latter two might be used inside the dataset identifier, or as pointers to locate the data, internally by the archiving organization. But what is "inside" is not our business. Similarly the date might be used in the identifier, again none of our business. I agree that usually an "observational" (i.e. not "multi-observation" dataset may be linked to a single date, although the reverse is not necessarily true. I mean I forgot one case in the examples in my previous posting, i.e. the third below : - ground based observatories typically observe on position of the sky from one instrument at one telescope at a time - space observatories often observe a position of the sky from SEVERAL coaxial (although different FoV size) instruments/telescopes on the satellite (and for me this is ONE dataset) - however sometimes there are non-coaxial instruments. I take the case of BeppoSAX, where during each OP (Observing Period) one had 2-3 different FOTs (datasets) : one for the NFIs (Narrow Field Instruments) pointing along the Z axis, and one each for the two WFCs (Wide Field Cameras) pointing along +Y and -Y (maybe just one was on). I guess RXTE with the ASM has something similar. ----------------------------------------------------------------------- From: Thomas McGlynn Date: Wed, 24 Mar 2004 10:11:37 -0500 [...] any specific syntax used. E.g., in FITS today we have keywords ORIGIN, TELESCOP, INSTRUME and OBSERVER where the general semantics of the keyword is specified, but the format is completely undefined Unfortunately also some aspects of the semantics are ill-defined (see discussions done at different times). May be it would be better to precise usage a bit more. Although most details (including some I've raised) are out of scope indeed. We should for instance state that the keyword is a string, and that the first substring from the beginning to the first slash defines a namespace, while the rest of the content is defined by the authority managing such namespace. We should also indicate the perspective usage, which is still not totally clear to me (see below). So I see the discussion about where such a keyword would go, I.e. in primary header, in each extension header, in some extension header whether we need a keyword that allows for multiple values (which DS_IDENT would not) as the kind of things we could Do you mean multiple occurrences of the same keyword (like HISTORY or COMMENT) or breaking a single long string value in continuation keywords ? to be at least an option for the id to be a vector value. The later requirement mandates a shorter keyword (perhaps just DSID). See below on "vector" However, I do not think that this is the appropriate forum for discussion of a particular syntax for the value of this keyword. Except for the above notion of namespace, and for a possibility to define that it should be a string contained in a SINGLE keyword (that would limit its length to 68 characters). From: Rob Seaman Date: Wed, 24 Mar 2004 17:22:31 +0000 (UTC) It may well be that all astronomical semantic discussions should now happen under the happy VO umbrella. Personally, I think FITS has too often skirted the difficult issues. If we are to debate reserving DSIDnnnn for something called "dataset identifiers", isn't it appropriate to address what that means? If not, why do we care if an obscure set of keyword names are reserved at all? That would avoid the loose situation we have for ORIGIN etc. My own read on this part of the discussion is that most people would want to see the ID repeated in all relevant HDU's Yes. My personal inclination (as an extremist Ockhamist) is that keywords shall not be multiplied praeter necessitatem. So I would tend to put one (set of) keyword(s) in the primary header if they apply to all the file, and to put it in the extensions only when they differ. and that there probably needs to be at least an option for the id to be a vector value. If by vector, you mean repeated keywords from the same or different ID families, I agree. IDs are long strings. Won't fit many in 80 chars. It would also be possible to impose a syntax limitation that each identifier is limited to the space of a single kwd (68 characters excluding the DSIDENT ='...'). If the given file (or HDU) "belongs" (or "refers" ? see below) to more than one dataset at the same time and with equal rank, one could allow for repeated DSIDENT kwds (like COMMENT, HISTORY). However one may need a sequence of DSIDnn if either : - the file "belongs" or "refers" to different datasets with some priority or ranking order - one wants to keep track of an history : i.e. this file belongs to the dataset I reduced (DSID01), I started my reduction from the result of the pipeline provided by the xyz archive centre (DSID02), which used the raw data of the given observation taken with the uvw telescope/satellite (DSID03) Why should an ID have the time? Astronomers have too often relied on convoluted filenames to convey the placement of a specific data file within some multidimensional parameter space. Time is key to groundbased observations because access to our Also for satellites. Time is relevant because it's related to scheduling. But that does not mean it has (or has not) to be part of the id. See above. None of our business. Why does it need a proposal ID, nation, agency? Our need for a dataset identifier is precisely to implement the proprietary policies of our current organization. I am very supportive The identifier will just say "go to this site to eventually retrieve the dataset". It's up to the site to then say "this dataset is not yet public", to protect it with a password, or whatever. From: Arnold Rots Date: Wed, 24 Mar 2004 15:47:57 -0500 (EST) The scope of Tom's proposal is really quite limited: He is announcing the establishment of a convention that employs a keyword (DS_IDENT) or set of keywords (DS_IDiii). The intent is that the value of that keyword contains a label or key that will allow users to obtain a pointer to a particular volume in astronomical data space. No less, but also no more. just a little bit more Within the space of data identifier strings only the subspace of strings starting with "ADS/" (case-insensitive!) is reserved. I believe you should reserve also the fact that the first part of the id is the namespace, and delegate all the rest to the namespace authority. May be one should also add another kwd (DSAUTHOR) which points to an URL of the namespace authority. Or are we imagining something like the DNS with a set of "root nameservers" ? and purposes. For the Chandra Data Archive what you will get in response to the key is a URL that will allow you to request a download of data products associated with a particular observation - or maybe a set of observations. If you try again next month, the files may be different: we may have reprocessed or decided to add some products to the package. Hmmm ... I'm a bit worried by the fact that the dataset may change. Maybe that's why it is not yet so clear to me what usage an user will do of the dataset identifier. Let's make some examples. a) I read a paper, which tells me "the data used here belong to dataset xyz". I want to repeat the analysis of the SAME data myself, so I use the id to retrieve the data. Obviously here I want to get the SAME data, not a further and better version (do I ?). No FITS file involved here though on the user end. b) I retrieve the files, and I want to check they really belong to the correct dataset. c) I have got somehow some files, and I want to know to what observation do they refer, or to retrieve more files of the same dataset, or to find what papers have been published using them. d) I do my analysis and produce some more files. These are private, but I may want to document that the starting point of the analysis was the given dataset. But DS-IDENT is not the right way, my data DO NOT belong to the dataset, I need a separate history kwd ... ... if I'd ever distribute the data (I suppose I also have to quote the DS-IDENT in any paper I will write, for the ADS to use it) Again, think of the dataset identifier as a key that allows the user to obtain a pointer to the dataset. There is no need to encode any information in it - nor is that prohibited Agreed The list of informational metadata that Rob provided looks to me more like metadata that ought to reside in a database. (or in other keywords in the same file if desired) -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. |
#26
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Let me answer to a bunch of messages in one go.
(By the way, if you've not already guessed, the messages tagged "LC's Nospam ..." are by me as well, it depends if I use fitsbits or post the NG ; in general I use the mailing list for longer messages) From: Rob Seaman Date: Tue, 23 Mar 2004 21:54:40 +0000 (UTC) You will find the current list at: http://vo.ads.harvard.edu/dv/facilities.txt A very interesting list. .... Indeed. There appears to be a confusion between a ground-based observing site and an observatory - perhaps this is a result of the list being compiled by our friends in the space-based astronomical community? Maybe, that's why I found it quite natural for me ... .... although I've some reservations just on the "satellite" subset, namely is a Sa:spacecraft enough to define where the dataset is archived ? Surely yes if there is a single archive managed by a single space agency or its "contractor". Possibly yes if the satellite is a cooperation between different agencies, AND they have agreed to run the same pipeline AND to keep mirror sites May fail if different agencies, organizations or institutes decide to run different pipelines on the same data ! Resulting in two separate datasets stemming from the same raw data. In general an observatory is a political entity, a telescope is a facility, and a site like Kitt Peak is a piece of real estate that may be host multiple facilities from multiple observatories. Depending on the details of contracts or other binding operating agreements, an observatory may "own" the data that result from a particular facility like a telescope, I guess it does not matter at all who owns the data rights for the period during which the data are not public. If the data are to be indexed, it means they are public ... either in some official archive or possibly in some private one. A dataset ID can be a relatively simple beast - perhaps as simple as a data source ID and a serial number. But the full taxonomy of dataset provenance has to support many degrees of freedom. At the very least: Nation Funding agency Just for the sake of argument, a "funding agency" is not necessarily associated with a single nation, at least this side of the Atlantic (ESA, ESO) ... or of the Panama canal (ESO again :-) ). Observatory Consortium member ("partner") The latter is hardly relevant to the identification of the dataset Telescope Instrument these and the above are (loosely) covered by ORIGIN, TELESCOP, INSTRUME, or other keywords which may be in the same FITS file, or (as said by others already) in some database at the archive site Date&Time Proposal ID PI and/or project ID The latter two might be used inside the dataset identifier, or as pointers to locate the data, internally by the archiving organization. But what is "inside" is not our business. Similarly the date might be used in the identifier, again none of our business. I agree that usually an "observational" (i.e. not "multi-observation" dataset may be linked to a single date, although the reverse is not necessarily true. I mean I forgot one case in the examples in my previous posting, i.e. the third below : - ground based observatories typically observe on position of the sky from one instrument at one telescope at a time - space observatories often observe a position of the sky from SEVERAL coaxial (although different FoV size) instruments/telescopes on the satellite (and for me this is ONE dataset) - however sometimes there are non-coaxial instruments. I take the case of BeppoSAX, where during each OP (Observing Period) one had 2-3 different FOTs (datasets) : one for the NFIs (Narrow Field Instruments) pointing along the Z axis, and one each for the two WFCs (Wide Field Cameras) pointing along +Y and -Y (maybe just one was on). I guess RXTE with the ASM has something similar. ----------------------------------------------------------------------- From: Thomas McGlynn Date: Wed, 24 Mar 2004 10:11:37 -0500 [...] any specific syntax used. E.g., in FITS today we have keywords ORIGIN, TELESCOP, INSTRUME and OBSERVER where the general semantics of the keyword is specified, but the format is completely undefined Unfortunately also some aspects of the semantics are ill-defined (see discussions done at different times). May be it would be better to precise usage a bit more. Although most details (including some I've raised) are out of scope indeed. We should for instance state that the keyword is a string, and that the first substring from the beginning to the first slash defines a namespace, while the rest of the content is defined by the authority managing such namespace. We should also indicate the perspective usage, which is still not totally clear to me (see below). So I see the discussion about where such a keyword would go, I.e. in primary header, in each extension header, in some extension header whether we need a keyword that allows for multiple values (which DS_IDENT would not) as the kind of things we could Do you mean multiple occurrences of the same keyword (like HISTORY or COMMENT) or breaking a single long string value in continuation keywords ? to be at least an option for the id to be a vector value. The later requirement mandates a shorter keyword (perhaps just DSID). See below on "vector" However, I do not think that this is the appropriate forum for discussion of a particular syntax for the value of this keyword. Except for the above notion of namespace, and for a possibility to define that it should be a string contained in a SINGLE keyword (that would limit its length to 68 characters). From: Rob Seaman Date: Wed, 24 Mar 2004 17:22:31 +0000 (UTC) It may well be that all astronomical semantic discussions should now happen under the happy VO umbrella. Personally, I think FITS has too often skirted the difficult issues. If we are to debate reserving DSIDnnnn for something called "dataset identifiers", isn't it appropriate to address what that means? If not, why do we care if an obscure set of keyword names are reserved at all? That would avoid the loose situation we have for ORIGIN etc. My own read on this part of the discussion is that most people would want to see the ID repeated in all relevant HDU's Yes. My personal inclination (as an extremist Ockhamist) is that keywords shall not be multiplied praeter necessitatem. So I would tend to put one (set of) keyword(s) in the primary header if they apply to all the file, and to put it in the extensions only when they differ. and that there probably needs to be at least an option for the id to be a vector value. If by vector, you mean repeated keywords from the same or different ID families, I agree. IDs are long strings. Won't fit many in 80 chars. It would also be possible to impose a syntax limitation that each identifier is limited to the space of a single kwd (68 characters excluding the DSIDENT ='...'). If the given file (or HDU) "belongs" (or "refers" ? see below) to more than one dataset at the same time and with equal rank, one could allow for repeated DSIDENT kwds (like COMMENT, HISTORY). However one may need a sequence of DSIDnn if either : - the file "belongs" or "refers" to different datasets with some priority or ranking order - one wants to keep track of an history : i.e. this file belongs to the dataset I reduced (DSID01), I started my reduction from the result of the pipeline provided by the xyz archive centre (DSID02), which used the raw data of the given observation taken with the uvw telescope/satellite (DSID03) Why should an ID have the time? Astronomers have too often relied on convoluted filenames to convey the placement of a specific data file within some multidimensional parameter space. Time is key to groundbased observations because access to our Also for satellites. Time is relevant because it's related to scheduling. But that does not mean it has (or has not) to be part of the id. See above. None of our business. Why does it need a proposal ID, nation, agency? Our need for a dataset identifier is precisely to implement the proprietary policies of our current organization. I am very supportive The identifier will just say "go to this site to eventually retrieve the dataset". It's up to the site to then say "this dataset is not yet public", to protect it with a password, or whatever. From: Arnold Rots Date: Wed, 24 Mar 2004 15:47:57 -0500 (EST) The scope of Tom's proposal is really quite limited: He is announcing the establishment of a convention that employs a keyword (DS_IDENT) or set of keywords (DS_IDiii). The intent is that the value of that keyword contains a label or key that will allow users to obtain a pointer to a particular volume in astronomical data space. No less, but also no more. just a little bit more Within the space of data identifier strings only the subspace of strings starting with "ADS/" (case-insensitive!) is reserved. I believe you should reserve also the fact that the first part of the id is the namespace, and delegate all the rest to the namespace authority. May be one should also add another kwd (DSAUTHOR) which points to an URL of the namespace authority. Or are we imagining something like the DNS with a set of "root nameservers" ? and purposes. For the Chandra Data Archive what you will get in response to the key is a URL that will allow you to request a download of data products associated with a particular observation - or maybe a set of observations. If you try again next month, the files may be different: we may have reprocessed or decided to add some products to the package. Hmmm ... I'm a bit worried by the fact that the dataset may change. Maybe that's why it is not yet so clear to me what usage an user will do of the dataset identifier. Let's make some examples. a) I read a paper, which tells me "the data used here belong to dataset xyz". I want to repeat the analysis of the SAME data myself, so I use the id to retrieve the data. Obviously here I want to get the SAME data, not a further and better version (do I ?). No FITS file involved here though on the user end. b) I retrieve the files, and I want to check they really belong to the correct dataset. c) I have got somehow some files, and I want to know to what observation do they refer, or to retrieve more files of the same dataset, or to find what papers have been published using them. d) I do my analysis and produce some more files. These are private, but I may want to document that the starting point of the analysis was the given dataset. But DS-IDENT is not the right way, my data DO NOT belong to the dataset, I need a separate history kwd ... ... if I'd ever distribute the data (I suppose I also have to quote the DS-IDENT in any paper I will write, for the ADS to use it) Again, think of the dataset identifier as a key that allows the user to obtain a pointer to the dataset. There is no need to encode any information in it - nor is that prohibited Agreed The list of informational metadata that Rob provided looks to me more like metadata that ought to reside in a database. (or in other keywords in the same file if desired) -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. |
#27
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Lucio Chiappetti wrote:
Let me answer to a bunch of messages in one go. ... From: Arnold Rots Date: Wed, 24 Mar 2004 15:47:57 -0500 (EST) The scope of Tom's proposal is really quite limited: He is announcing the establishment of a convention that employs a keyword (DS_IDENT) or set of keywords (DS_IDiii). The intent is that the value of that keyword contains a label or key that will allow users to obtain a pointer to a particular volume in astronomical data space. No less, but also no more. just a little bit more OK, just the following sentence. Within the space of data identifier strings only the subspace of strings starting with "ADS/" (case-insensitive!) is reserved. I believe you should reserve also the fact that the first part of the id is the namespace, and delegate all the rest to the namespace authority. May be one should also add another kwd (DSAUTHOR) which points to an URL of the namespace authority. Or are we imagining something like the DNS with a set of "root nameservers" ? Nothing is implied or recommended by this proposal. We took great pains to ensure that the ADS/ identifiers be conforming with the standard being developed by the IVOA, but that is not part of this proposal. Others may want to suggest further conventions tieing the two together in the future, but this is not the time to do that - for one thing, the IVOA standard has not yet been completed. and purposes. For the Chandra Data Archive what you will get in response to the key is a URL that will allow you to request a download of data products associated with a particular observation - or maybe a set of observations. If you try again next month, the files may be different: we may have reprocessed or decided to add some products to the package. Hmmm ... I'm a bit worried by the fact that the dataset may change. Maybe that's why it is not yet so clear to me what usage an user will do of the dataset identifier. Let's make some examples. a) I read a paper, which tells me "the data used here belong to dataset xyz". I want to repeat the analysis of the SAME data myself, so I use the id to retrieve the data. Obviously here I want to get the SAME data, not a further and better version (do I ?). No FITS file involved here though on the user end. Do you just want to repeat the analysis or do you want to do a better job? We would give you the current (best) set of data products based on the same raw observational data, so you can do your (better) job. If you can't reconcile results, you can ask us for the version that was (most likely) used for the paper and we'll be happy to give it to you, provided it was a "good" version. b) I retrieve the files, and I want to check they really belong to the correct dataset. c) I have got somehow some files, and I want to know to what observation do they refer, or to retrieve more files of the same dataset, or to find what papers have been published using them. This all goes by OBSID, not DS_IDENT, at least for us, although we could make it work through the idenitifier as well. d) I do my analysis and produce some more files. These are private, but I may want to document that the starting point of the analysis was the given dataset. But DS-IDENT is not the right way, my data DO NOT belong to the dataset, I need a separate history kwd ... Agreed. ... if I'd ever distribute the data (I suppose I also have to quote the DS-IDENT in any paper I will write, for the ADS to use it) That's the idea - or multiple identifiers. Again, think of the dataset identifier as a key that allows the user to obtain a pointer to the dataset. There is no need to encode any information in it - nor is that prohibited Agreed The list of informational metadata that Rob provided looks to me more like metadata that ought to reside in a database. (or in other keywords in the same file if desired) -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. _______________________________________________ fitsbits mailing list http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
#28
|
|||
|
|||
[fitsbits] 'Dataset Identifications' postings (digest)
Lucio Chiappetti wrote:
Let me answer to a bunch of messages in one go. ... From: Arnold Rots Date: Wed, 24 Mar 2004 15:47:57 -0500 (EST) The scope of Tom's proposal is really quite limited: He is announcing the establishment of a convention that employs a keyword (DS_IDENT) or set of keywords (DS_IDiii). The intent is that the value of that keyword contains a label or key that will allow users to obtain a pointer to a particular volume in astronomical data space. No less, but also no more. just a little bit more OK, just the following sentence. Within the space of data identifier strings only the subspace of strings starting with "ADS/" (case-insensitive!) is reserved. I believe you should reserve also the fact that the first part of the id is the namespace, and delegate all the rest to the namespace authority. May be one should also add another kwd (DSAUTHOR) which points to an URL of the namespace authority. Or are we imagining something like the DNS with a set of "root nameservers" ? Nothing is implied or recommended by this proposal. We took great pains to ensure that the ADS/ identifiers be conforming with the standard being developed by the IVOA, but that is not part of this proposal. Others may want to suggest further conventions tieing the two together in the future, but this is not the time to do that - for one thing, the IVOA standard has not yet been completed. and purposes. For the Chandra Data Archive what you will get in response to the key is a URL that will allow you to request a download of data products associated with a particular observation - or maybe a set of observations. If you try again next month, the files may be different: we may have reprocessed or decided to add some products to the package. Hmmm ... I'm a bit worried by the fact that the dataset may change. Maybe that's why it is not yet so clear to me what usage an user will do of the dataset identifier. Let's make some examples. a) I read a paper, which tells me "the data used here belong to dataset xyz". I want to repeat the analysis of the SAME data myself, so I use the id to retrieve the data. Obviously here I want to get the SAME data, not a further and better version (do I ?). No FITS file involved here though on the user end. Do you just want to repeat the analysis or do you want to do a better job? We would give you the current (best) set of data products based on the same raw observational data, so you can do your (better) job. If you can't reconcile results, you can ask us for the version that was (most likely) used for the paper and we'll be happy to give it to you, provided it was a "good" version. b) I retrieve the files, and I want to check they really belong to the correct dataset. c) I have got somehow some files, and I want to know to what observation do they refer, or to retrieve more files of the same dataset, or to find what papers have been published using them. This all goes by OBSID, not DS_IDENT, at least for us, although we could make it work through the idenitifier as well. d) I do my analysis and produce some more files. These are private, but I may want to document that the starting point of the analysis was the given dataset. But DS-IDENT is not the right way, my data DO NOT belong to the dataset, I need a separate history kwd ... Agreed. ... if I'd ever distribute the data (I suppose I also have to quote the DS-IDENT in any paper I will write, for the ADS to use it) That's the idea - or multiple identifiers. Again, think of the dataset identifier as a key that allows the user to obtain a pointer to the dataset. There is no need to encode any information in it - nor is that prohibited Agreed The list of informational metadata that Rob provided looks to me more like metadata that ought to reside in a database. (or in other keywords in the same file if desired) -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. _______________________________________________ fitsbits mailing list http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
digest 2453183 | Frederick Shorts | Astronomy Misc | 3 | July 1st 04 08:29 PM |
[fitsbits] Dataset identifications. | Jonathan McDowell | FITS | 3 | March 12th 04 03:57 PM |
[fitsbits] Dataset identifications. | Thierry Forveille | FITS | 12 | March 12th 04 02:33 PM |
[fitsbits] Dataset identifications. | Thomas McGlynn | FITS | 0 | March 10th 04 07:20 PM |
antagonist's digest, volume 2452854 | dizzy | Astronomy Misc | 4 | August 7th 03 01:02 AM |