|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
[fitsbits] FITS 'P' descriptors: signed or unsigned?
This note concerns a relatively small technical issue in the larger proposal
to add 64-bit integer support to FITS: At issue is whether to reverse the recent decision to define the 'P' variable-length array descriptors in FITS binary tables to be a pair of 'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead. The first integer gives the number of elements in the array, and the 2nd integer gives the byte offset in the heap to the first element of the array. The practical consequence of this change is that it will double the allowed heap size from about 2.1 GB to about 4.2 GB. This is not just a theoretical issue because there are existing applications that can easily produce FITS files with binary table heaps larger than 2.1 GB (e.g., using the 'tiled' image compression convention where the compressed rows of an image are stored in a variable length array table column). Allowing this extra factor of 2 in size will benefit software applications that would otherwise need to be rewritten to use the proposed 'Q' 64-bit descriptors (assuming that the 'Q' type is eventually approved by the FITS committees). There are no technical reasons not to support unsigned descriptor values (e.g., it is impossible to have negative descriptors). Forcing the descriptors to be signed 32-bit integers artificially cuts in half the potential size of the heap. The main argument for keeping the descriptors as signed integers is that FITS has never supported unsigned integers as a raw data type (although it does support unsigned integers by applying an offset to the FITS signed integer values). Thus, it is argued, the definition of FITS remains more 'pure' if we don't introduce unsigned integers in this case. There is however a real distinction between the array descriptor values and the other FITS table column data types because the descriptor values themselves are almost never directly accessible at the application software level. Instead, the descriptor values are only used by the low-level FITS interface software routines, when accessing the arrays that the descriptor points to. I don't consider this to be a major issue, but given a choice, I think the practical advantages of doubling the allowed size of the heap out weighs the more intangible 'purity of FITS' issue. How do others feel about this issue? Is there a clear consensus one way or the other? Should the FITS committees be explicitly asked to vote on a preference? This issue does not affect the proposed 'Q' 64-bit descriptors, because even signed 64-bit integers provide vastly more address space than could conceivably be used by any applications in the foreseeable future. Presumably the sign of the 'Q' descriptors should be defined to be the same as whatever is decided for the 'P' descriptors. As a final note, to put this in historical perspective, the original FITS binary table definition paper did not specify the sign of the descriptor integers. It was only when the variable-length array convention was approved by the FITS committees earlier this year that the wording was made more rigorous to define the sign. The reason for choosing 'signed' rather than 'unsigned' was mainly because at the time there did not exist any software implementations that supported unsigned descriptor values. Subsequently, some FITS libraries (e.g., CFITSIO) have been enhanced to support unsigned descriptor values. If we make this change now, it will reverse a decision that was only finally approved in April 2005. Also, it will not invalidate any existing FITS files, because the positive, signed descriptor values can always be treated as unsigned integers. Bill Pence -- __________________________________________________ __________________ Dr. William Pence NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice) Greenbelt MD 20771 +1-301-286-1684 (fax) |
#2
|
|||
|
|||
Hi Bill -
Unless someone can come up with a compelling reason why this causes a technical problem I would support it (using unsigned for 32-bit pointers). The 2 GB data size limit is getting to be a major problem which we have to deal with. The real solution is 64-bit support, but a factor of 2 for something like this makes a big difference. The only issue I can see is that older programs not expecting unsigned would interpret such offsets has having a negative value and probably reject the file. In the worst case (software fails to check for a negative value) a pointer error could occur and invalid data could be returned. - Doug On Wed, 15 Jun 2005, William Pence wrote: This note concerns a relatively small technical issue in the larger proposal to add 64-bit integer support to FITS: At issue is whether to reverse the recent decision to define the 'P' variable-length array descriptors in FITS binary tables to be a pair of 'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead. The first integer gives the number of elements in the array, and the 2nd integer gives the byte offset in the heap to the first element of the array. The practical consequence of this change is that it will double the allowed heap size from about 2.1 GB to about 4.2 GB. This is not just a theoretical issue because there are existing applications that can easily produce FITS files with binary table heaps larger than 2.1 GB (e.g., using the 'tiled' image compression convention where the compressed rows of an image are stored in a variable length array table column). Allowing this extra factor of 2 in size will benefit software applications that would otherwise need to be rewritten to use the proposed 'Q' 64-bit descriptors (assuming that the 'Q' type is eventually approved by the FITS committees). There are no technical reasons not to support unsigned descriptor values (e.g., it is impossible to have negative descriptors). Forcing the descriptors to be signed 32-bit integers artificially cuts in half the potential size of the heap. The main argument for keeping the descriptors as signed integers is that FITS has never supported unsigned integers as a raw data type (although it does support unsigned integers by applying an offset to the FITS signed integer values). Thus, it is argued, the definition of FITS remains more 'pure' if we don't introduce unsigned integers in this case. There is however a real distinction between the array descriptor values and the other FITS table column data types because the descriptor values themselves are almost never directly accessible at the application software level. Instead, the descriptor values are only used by the low-level FITS interface software routines, when accessing the arrays that the descriptor points to. I don't consider this to be a major issue, but given a choice, I think the practical advantages of doubling the allowed size of the heap out weighs the more intangible 'purity of FITS' issue. How do others feel about this issue? Is there a clear consensus one way or the other? Should the FITS committees be explicitly asked to vote on a preference? This issue does not affect the proposed 'Q' 64-bit descriptors, because even signed 64-bit integers provide vastly more address space than could conceivably be used by any applications in the foreseeable future. Presumably the sign of the 'Q' descriptors should be defined to be the same as whatever is decided for the 'P' descriptors. As a final note, to put this in historical perspective, the original FITS binary table definition paper did not specify the sign of the descriptor integers. It was only when the variable-length array convention was approved by the FITS committees earlier this year that the wording was made more rigorous to define the sign. The reason for choosing 'signed' rather than 'unsigned' was mainly because at the time there did not exist any software implementations that supported unsigned descriptor values. Subsequently, some FITS libraries (e.g., CFITSIO) have been enhanced to support unsigned descriptor values. If we make this change now, it will reverse a decision that was only finally approved in April 2005. Also, it will not invalidate any existing FITS files, because the positive, signed descriptor values can always be treated as unsigned integers. Bill Pence |
#3
|
|||
|
|||
On Wed 2005/06/15 18:21:27 -0400, William Pence wrote in a message to: FITSBITS How do others feel about this issue? Is there a clear consensus one way or the other? Should the FITS committees be explicitly asked to vote on a preference? Currently P-descriptors are effectively only 31-bit, so it should be possible to extend them to 32-bits and beyond in whatever way seems best. I favour using unsigned ints since it matches the data type to the intended usage and provides a clean progression from 32- to 64-bit descriptors. Existing software that can only handle 31-bit descriptors won't automatically understand the extended syntax. However, the requirement for backwards compatibility, "once FITS always FITS", refers to the data, not the software. Mark Calabretta ATNF |
#4
|
|||
|
|||
I'm in favor of the change from signed to unsigned 32-bit integer. As
described, there is nothing "signed" about the number of array elements or byte offsets. Better to make such a change sooner than later! Arne |
#5
|
|||
|
|||
On Wed, 15 Jun 2005, William Pence wrote:
At issue is whether to reverse the recent decision to define the 'P' variable-length array descriptors in FITS binary tables to be a pair of 'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead. The practical consequence of this change is that it will double the allowed heap size from about 2.1 GB to about 4.2 GB. the same result (going beyond, actually WELL beyond, 2 GB) could be achieved with the new Q type pointers. For the Q type pointers being signed or unsigned really does not matter (a factor of 2 on "nearly infinity" :-) ). The main argument for keeping the descriptors as signed integers is that FITS has never supported unsigned integers as a raw data type (although it does That is a valid "elegance" argument, I would say as valid as the fact that data and pointers are two different things (the two cancel out reciprocally). How do others feel about this issue? Is there a clear consensus one way or the other? Should the FITS committees be explicitly asked to vote on a preference? I suggest the issue is voted together with the Q descriptors, since it requires a change to the recently approved standard, and there is no sense in having P signed and Q unsigned. So either we vote (A) to have Q signed (requiring no change to the April vote), or we vote (B) to introduce Q unsigned, and at the same time make P unsigned. I'm not sure of the best way. If the matter is sorted out by preliminary discussion, than a "traditional" vote is called on either proposal (A) or proposal (B). Do the formal voting rules allow to vote on a non-binary alternative ( YES A, YES B or NO instead of YES NO) ? My concerns with the unsigned issue are two : - one is the Once FITS Always FITS ... but this is probably weak, all files produced before April were "not official" so there won't be many produced afterwards. And anyhow all the "signed" one, if not negative (as they should not be) remain legal. However some s/w has to be changed (if there is any which is affected they should speak now or never !) - the other one is that unsigned may not be supported by all programming languages (I'm specifically thinking of Fortran), which somehow imbeds a language preference in FITS. It is however true than one can call a library routine supporting unsigned in another language. And this will apply only to case of "large heaps" ... ... after all a program written for a specific purpose can legitimately detect a particular FITS feature and decide not to support it (if not required in specific context). Such a program could transparently support descriptors until the n-1 bit limit, and signal "unsupported" if the descriptor "goes negative" for it. Lucio Chiappetti -- ---------------------------------------------------------------------- is a newsreading account used by more persons to avoid unwanted spam. Any mail returning to this address will be rejected. Users can disclose their e-mail address in the article if they wish so. |
#6
|
|||
|
|||
On Wed, Jun 15, 2005 at 06:21:27PM -0400, William Pence wrote:
At issue is whether to reverse the recent decision to define the 'P' variable-length array descriptors in FITS binary tables to be a pair of 'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead. The first integer gives the number of elements in the array, and the 2nd integer gives the byte offset in the heap to the first element of the array. The practical consequence of this change is that it will double the allowed heap size from about 2.1 GB to about 4.2 GB. I would recommend *against* this change. It only gains you one bit, and does have potential implementation issues. It's not worth it. I routinely run into 2GB limits, and about 2/3 as often I run into 4GB limits. The solution is the Q descriptors, which almost everybody agrees are necessary. descriptors). Forcing the descriptors to be signed 32-bit integers artificially cuts in half the potential size of the heap. I argue that a factor of only two is too small to be of general use. -- Mike Nolan +1 787 878 2612 Fax: +1 787 878 1861 Arecibo Observatory, HC3 Box 93995, Arecibo, Puerto Rico 00612 |
#7
|
|||
|
|||
On Thu, 16 Jun 2005, LC's No-Spam Newsreading account wrote:
I suggest the issue is voted together with the Q descriptors, since it requires a change to the recently approved standard, and there is no sense in having P signed and Q unsigned. I agree that it makes no sense, and also lacks elegance and consistency, but these qualities have never played much part in the design of FITS, and I wonder if it isn't a bit late to start now :-) - the other one is that unsigned may not be supported by all programming languages (I'm specifically thinking of Fortran), which Well I'm a dyed-in-the-wool Fortran programmer, but I don't think this affects the argument at all. Fortran code is likely to depend upon a library such as FITSIO to do its dirty work, and even if people insist on writing Fortran to read FITS files directly, there are fairly easy solutions to this problem. My own interest in the 64-bit topic was sparked by finding FITS binary tables that were likely to exceed 2 GB in size, and with the possibility that they might have tables of over 2 billion rows. This would mean NAXIS2 in the header would have an integer constant above the limit for a 32-bit integer. As far as I can see there is nothing in the FITS Standard to affect this, it's only specific implementations that might be lacking. The only thing that's releant here is that it might be a good idea for the revised Standard to include a note pointing out that integers values in headers might be larger than a 32-bit integer can handle. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. |
#8
|
|||
|
|||
Yes for the 64-bit in BITPIX / K / Q, but I would stronly suggest to *NOT* expand the pointer/length from 31 to 32 bits in the P convention: first the gain is minimal, and second the limit on file size of 32-bit Unix machines used to be 2Gb, and not 4Gb -- the "largefile" Unix extension was a move beyond 2Gb, and not beyond 4Gb (the 'G' here means 1024^3) There was also some suggestion about having a versioning mechanism in FITS -- and it would be a great benefit, I think, to know right from the beginning of a FITS input stream that it has (or may have), far beyond the beginning of the file, some extensions which may NOT be recognized by an old software. I feel this modification would be a good opportunity to include this versioning mechanism in FITS. --Francois. ================================================== ============================== Francois Ochsenbein ------ Observatoire Astronomique de Strasbourg 11, rue de l'Universite F-67000 STRASBOURG Phone: +33-(0)390 24 24 29 Email: (France) Fax: +33-(0)390 24 24 17 ================================================== ============================== |
#9
|
|||
|
|||
I am in favor of signed integers and of the 64-bit proposal.
Eric |
#10
|
|||
|
|||
On Thursday 16 June 2005 00:21, William Pence wrote:
At issue is whether to reverse the recent decision to define the 'P' variable-length array descriptors in FITS binary tables to be a pair of 'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead. I'm sorry but we cannot 'reverse the recent decision'. That would violate the section 9 of the FITS Standard 'Restrictions on Changes' Technically there is no good argument for 'unsigned 32-bit integers' as we are discussing 64-bit pointers. To keep the standard more symmetric I would argue for signed 64-bit integers. When we will need the last bit we should start the discussions on 128-bit integer. In general, I agree on the need for 64-bit pointers and integer columns (both signed) in binary tables based on the need for large heaps and time stamp. I still have reservation concerning BITPIX=64 for the following reasons: 1) there seems no good physical reason for 64-bit integer images. The number of photons from astronomical source hardly justifies it especially considering their statistical distribution. Let someone present a real, practical case and we should considere it. 2) The FITS standard is useful because the vast majority of systems implements it - that is if one writes a conforming FITS file the likelihood of reading it on any system is high. Adding BITPIX=64 would require changes at the top level of all readers. In order for this to actually be implemented people would have to feel the need otherwise it remains empty words. Preben Grosbol |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
FITS long integer support (was [fitsbits] ADASS FITS BoFon Sunday) | William Pence | FITS | 6 | October 22nd 04 08:23 PM |
[fitsbits] FITS long integer support | Steve Allen | FITS | 0 | October 21st 04 06:22 PM |
[fitsbits] Start of the FITS MIME type Public Comment Period | William Pence | FITS | 8 | June 17th 04 06:08 AM |
[fitsbits] Happy Birthday, FITS! | Don Wells | FITS | 0 | March 28th 04 01:58 PM |
Reading floating point FITS files | John Green | FITS | 34 | November 29th 03 12:31 AM |