|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
Rob Seaman wrote: 4) Yes, but that boat has sailed. The community has been on a course to deal with inheritance since this note from the image extension paper: "Although allowed, it is recommended that the primary header does not set the keyword NAXIS=0, since it would not make sense to extend a non-existing image with another image." FITS is either going to tie the contents of separate HDUs together semantically or not. The community eagerly - and widely - adopted the notion of the primacy of the primary HDU - likely before the words above were published. Implicit here is that the primary header of an empty HDU is often used for information that applies to the entire file. Maybe I'm missing your point, but I don't see how that paper can be interpreted as an endorsement of the inherit convention. In that sentence you quote, and elsewhere in the paper, they make it clear that they do not recommend appending an image extension to a null primary array; instead they think the primary array should be filled first, and then only append more image extensions if the primary array is already occupied. This is contrary to the inherit convention, which requires that the primary array be empty to avoid confusion about whether the keywords in the primary array should be interpreted as applying globally to the following extensions or not. Some might suggest that with the abundance of low cost disk space that is now available, the inherit convention is trying to fix a non-problem. The amount of diskspace that is saved by not duplicating the keywords in every extension is rather insignificant in most cases and doesn't warrant the extra software complexity in supporting the inherit convention.. There are no doubt some pathological cases where the size of the headers could dominate the size of the whole file, but in those cases there may be alternate ways to pack the data more efficiently (e.g. pack the separate image extension data into vectors in rows of a single binary table extension). Bill Pence |
#2
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
Hi all! Yes - I'm still lurking around.
William Pence wrote in v.nrao.edu: Some might suggest that with the abundance of low cost disk space that is now available, the inherit convention is trying to fix a non-problem. The amount of diskspace that is saved by not duplicating the keywords in every extension is rather insignificant in most cases and doesn't warrant the extra software complexity in supporting the No, but avoiding potential errors by not duplicating text strings is a worthy effort, as we learned long ago from relational database theory. inherit convention.. There are no doubt some pathological cases where the size of the headers could dominate the size of the whole file, but in those cases there may be alternate ways to pack the data more efficiently (e.g. pack the separate image extension data into vectors in rows of a single binary table extension). In current practice or not, I think the philosophy of "it's better to seek forgiveness than permission" is dangerous in this context. If a convention breaks FITS, I believe it should be considered a private agreement and not part of the FITS standard. That doesn't mean it can't be used in practice - just that it's not FITS. -- Archie -- Archie Warnock warnock at awcubed dot com -- A/WWW Enterprises www.awcubed.com -- As a matter of fact, I _do_ speak for my employer. |
#3
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
On Fri, 6 Apr 2007, Archie Warnock wrote:
William Pence wrote in v.nrao.edu: Some might suggest that with the abundance of low cost disk space that is now available, the inherit convention is trying to fix a non-problem. The amount of diskspace that is saved by not duplicating the keywords in every extension is rather insignificant in most cases and doesn't warrant the extra software complexity in supporting the No, but avoiding potential errors by not duplicating text strings is a worthy effort, as we learned long ago from relational database theory. Well, if one really cares about such consistency, using multiple image extensions doesn't sound like a very good base. One single binary table maps a lot better to a data base than multiple image extensions that may or may not duplicate header information. I have (perhaps incorrect?) memories that the image extension was sold to the FITS community on the basis of being easier to use for simple cases than rows within a binary table (I was never quite convinced by that argument, but didn't really voice those concerns...). It seems that its use has grown beyond simple cases and that its limitations now bite. I know I am being a bit provocative here, but would it perhaps be time to consider deprecating the IMAGE extension?? |
#4
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
Indeed, this is getting off topic, but we might want to have a separate
discussion sometime about the "FITS data model" and perhaps even how to map this into more flexible serializations or storage mechanisms. (also, as a matter of history, FITS began as an image transport format, and tables and image extensions came much later). The basic FITS model provides a keyword table (PHU or some other form of empty image kludge), an N-dimensional image object, a table, plus a simple general container (the MEF). We can aggregate instances of these three basic objects in a container, an associate them in some fashion to model more complex objects, such as instrumental datasets. Usually this is done by defining a convention, e.g., using custom keywords in the PHU and/or extensions. One can argue that Table can contain anything including an image, but the regularly sampled N-Dim Image case is so important that it deserves its own class. If nothing else, this is still required to be able to efficiently store and access large data arrays. In addition, the basic Image object is much simple than Table, and much existing code can do useful things with a FITS image, but cannot do anything with a FITS table. Within VO, FITS is still the preferred format for image data, whereas VOTable is often used instead of FITS for table data. One could argue that the FITS Image is the most successful and widely used part of FITS, and even today provides a better mechanism for storing and manipulating regularly sampled data arrays than anything existing alternative. - Doug On Sat, 7 Apr 2007, Thierry Forveille wrote: On Fri, 6 Apr 2007, Archie Warnock wrote: William Pence wrote in v.nrao.edu: Some might suggest that with the abundance of low cost disk space that is now available, the inherit convention is trying to fix a non-problem. The amount of diskspace that is saved by not duplicating the keywords in every extension is rather insignificant in most cases and doesn't warrant the extra software complexity in supporting the No, but avoiding potential errors by not duplicating text strings is a worthy effort, as we learned long ago from relational database theory. Well, if one really cares about such consistency, using multiple image extensions doesn't sound like a very good base. One single binary table maps a lot better to a data base than multiple image extensions that may or may not duplicate header information. I have (perhaps incorrect?) memories that the image extension was sold to the FITS community on the basis of being easier to use for simple cases than rows within a binary table (I was never quite convinced by that argument, but didn't really voice those concerns...). It seems that its use has grown beyond simple cases and that its limitations now bite. I know I am being a bit provocative here, but would it perhaps be time to consider deprecating the IMAGE extension?? _______________________________________________ fitsbits mailing list http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits |
#5
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
On Fri 2007-04-06T14:52:27 +0000, Archie Warnock hath writ:
No, but avoiding potential errors by not duplicating text strings is a worthy effort, as we learned long ago from relational database theory. What FITS did not learn from relational database theory was how to create mechanisms which document and enforce the self consistency of data which have been neatly separated into distinct logical chunks. I think that's the way forward. -- Steve Allen WGS-84 (GPS) UCO/Lick Observatory Natural Sciences II, Room 165 Lat +36.99845 University of California Voice: +1 831 459 3046 Lng -122.06025 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m |
#6
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
On Sat, 7 Apr 2007, Steve Allen wrote:
On Fri 2007-04-06T14:52:27 +0000, Archie Warnock hath writ: No, but avoiding potential errors by not duplicating text strings is a worthy effort, as we learned long ago from relational database theory. I mentioned this point in my earlier mail as well. Within IRAF, the main motivation for INHERIT was to avoid duplication of information in multiple places within a MEF. This would very likely lead to problems with updates. It could also have advantages when viewing a MEF as a more complex object. What FITS did not learn from relational database theory was how to create mechanisms which document and enforce the self consistency of data which have been neatly separated into distinct logical chunks. I think that's the way forward. One could also say that this is not a FITS issue at all, but rather a more general data modeling issue. We are already getting into this within VO in several different contexts. What we will probably be doing is mapping some more general model or mechanism into a FITS representation. Typically such relationships and models need to be consistent regardless of how the information is stored, with FITS being only part of the picture. While this can be done with the current FITS mechanisms, it is awkward. The sometimes discussed "FITS 2.0", if it ever comes to pass, could address the respresentation issues but should not change the basic FITS data models. - Doug |
#7
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
Indeed, this is getting off topic, but we might want to have a separate
discussion sometime about the "FITS data model" and perhaps even how to map this into more flexible serializations or storage mechanisms. (also, as a matter of history, FITS began as an image transport format, and tables and image extensions came much later). Yeah, I am uunfortunately old enough to remember that (even enough to have written random groups) ;-). Tables were the first extension I think, then the image extension. The basic FITS model provides a keyword table (PHU or some other form of empty image kludge), an N-dimensional image object, a table, plus a simple general container (the MEF). We can aggregate instances of these three basic objects in a container, an associate them in some fashion to model more complex objects, such as instrumental datasets. Usually this is done by defining a convention, e.g., using custom keywords in the PHU and/or extensions. Well, that's one way of looking at it. The alternate perspective that I am arguing for is that everything should go into one table extension, with images as either multiple entries in one row or entries in successive rows. Essentially, that's the perspective that's taken by the Green Bank convention for sets of radioastronomical spectra. One can argue that Table can contain anything including an image, but the regularly sampled N-Dim Image case is so important that it deserves its own class. If nothing else, this is still required to be able to efficiently store and access large data arrays. Actually, storage and access inside a binary table is perhaps slightly more difficult to get right, but it is just as efficient as using image extensions (if anything margiinally more efficient, due to less block padding). In addition, the basic Image object is much simple than Table, and much existing code can do useful things with a FITS image, but cannot do anything with a FITS table. That's definitely a factor that needs consideration. For DENIS we used a large binary table to store stripes of 180 1kx1k images, but ended often/usually working through a filter that extracted one image to a FITS file because a tool expected that. On the other hand, that format did provide very robust consistency (stable header items in extension header, variable ones as element of the data rows, and nothing ever duplicated). Within VO, FITS is still the preferred format for image data, whereas VOTable is often used instead of FITS for table data. One could argue that the FITS Image is the most successful and widely used part of FITS, and even today provides a better mechanism for storing and manipulating regularly sampled data arrays than anything existing alternative. Simpler and most successful for sure. Better, that depends on what your goals/criteria are :-) |
#8
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
On Sat, 7 Apr 2007, Thierry Forveille wrote:
The basic FITS model provides a keyword table (PHU or some other form of empty image kludge), an N-dimensional image object, a table, plus a simple general container (the MEF). We can aggregate instances of these three basic objects in a container, an associate them in some fashion to model more complex objects, such as instrumental datasets. Usually this is done by defining a convention, e.g., using custom keywords in the PHU and/or extensions. Well, that's one way of looking at it. The alternate perspective that I am arguing for is that everything should go into one table extension, with images as either multiple entries in one row or entries in successive rows. Essentially, that's the perspective that's taken by the Green Bank convention for sets of radioastronomical spectra. There are cases where this is the best approach. If what you have is a large uniform collection of (not terribly large) images or spectra, then representation as a table is often the best approach. However I would not suggest that we replace Image with a Table-based representation containing one very long row. If the aggregation includes heterogeneous objects (e.g., images with substantially different headers) then a single table is not appropriate, and a MEF representation is probably better. - Doug |
#9
|
|||
|
|||
[fitsbits] Start of the 'INHERIT' Public Comment Period
Archie Warnock wrote:
No, but avoiding potential errors by not duplicating text strings is a worthy effort, as we learned long ago from relational database theory. Like I said, well-worn principles of database normalization. In current practice or not, I think the philosophy of "it's better to seek forgiveness than permission" is dangerous in this context. I'm a little unclear what permission should have been sought and from whom. INHERIT is completely legal FITS usage - the MEF format is legal, the dataless HDU is legal and the keyword is a legal boolean. This is particularly true since in the absence of a coherent data model, FITS is silent on issues of the semantic interconnectedness of extensions. Absent a data model, software developers still need to develop. If a convention breaks FITS, I believe it should be considered a private agreement and not part of the FITS standard. That doesn't mean it can't be used in practice - just that it's not FITS. None of the conventions are part of the FITS standard. However, even nonconforming FITS cannot "break FITS" or even break FITS applications. An application should do something reasonable even if presented with nonconforming input. In any event, input conforming to the INHERIT convention also conforms to FITS. Some applications may not know what to do with it, but the absence of a feature is not precisely the same thing as the presence of a bug. Thierry Forveille wrote: One single binary table maps a lot better to a data base than multiple image extensions that may or may not duplicate header information. I disagree. A typical normalized database consists of several tables. These tables may correspond to binary tables in FITS, but also may correspond to a hierarchy of FITS headers. Well chosen image extension headers will often be better than a single flat binary table. would it perhaps be time to consider deprecating the IMAGE extension?? Obviously a rhetorical question, but no, of course not. IMAGE extensions provide a mechanism for aggregating classical FITS image objects. FITS exists for mere astronomical mortals, not just for titans of software engineering. An MEF file of image extensions is vastly more accessible to our users, and likely much more robust for our applications. Not all astronomical data maps well onto image arrays, but CCDs and other array detectors do. On the other hand, tile compression provides a natural path for image extensions to map, one-to-one, onto binary tables. The headers, of course, copy directly across. Presumably by recommending the deprecation of the image extension, you're really suggesting deprecating the idea of the FITS header itself. Rob |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[fitsbits] Start of the 'INHERIT' Public Comment Period | Robert Hanisch | FITS | 3 | April 13th 07 09:37 AM |
[fitsbits] Start of the 'INHERIT' Public Comment Period | Robert Hanisch | FITS | 0 | April 6th 07 01:00 AM |
[fitsbits] Start of the 'INHERIT' Public Comment Period | Rob Seaman | FITS | 0 | April 5th 07 11:57 PM |
[fitsbits] Start of the 'INHERIT' Public Comment Period | William Pence | FITS | 0 | March 23rd 07 09:06 PM |
[fitsbits] Start of the WCS Paper III Public Comment Period | William Pence | FITS | 4 | October 23rd 04 06:10 PM |