[fitsbits] FITS 'P' descriptors: signed or unsigned?

#1 June 15th 05, 11:21 PM

This note concerns a relatively small technical issue in the larger proposal
to add 64-bit integer support to FITS:

At issue is whether to reverse the recent decision to define the 'P'
variable-length array descriptors in FITS binary tables to be a pair of
'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead.
The first integer gives the number of elements in the array, and the 2nd
integer gives the byte offset in the heap to the first element of the array.
The practical consequence of this change is that it will double the
allowed heap size from about 2.1 GB to about 4.2 GB.

This is not just a theoretical issue because there are existing applications
that can easily produce FITS files with binary table heaps larger than 2.1
GB (e.g., using the 'tiled' image compression convention where the
compressed rows of an image are stored in a variable length array table
column). Allowing this extra factor of 2 in size will benefit software
applications that would otherwise need to be rewritten to use the proposed
'Q' 64-bit descriptors (assuming that the 'Q' type is eventually approved by
the FITS committees). There are no technical reasons not to support
unsigned descriptor values (e.g., it is impossible to have negative
descriptors). Forcing the descriptors to be signed 32-bit integers
artificially cuts in half the potential size of the heap.

The main argument for keeping the descriptors as signed integers is that
FITS has never supported unsigned integers as a raw data type (although it
does support unsigned integers by applying an offset to the FITS signed
integer values). Thus, it is argued, the definition of FITS remains more
'pure' if we don't introduce unsigned integers in this case. There is
however a real distinction between the array descriptor values and the other
FITS table column data types because the descriptor values themselves are
almost never directly accessible at the application software level.
Instead, the descriptor values are only used by the low-level FITS interface
software routines, when accessing the arrays that the descriptor points to.

I don't consider this to be a major issue, but given a choice, I think the
practical advantages of doubling the allowed size of the heap out weighs the
more intangible 'purity of FITS' issue.

How do others feel about this issue? Is there a clear consensus one way or
the other? Should the FITS committees be explicitly asked to vote on a
preference?

This issue does not affect the proposed 'Q' 64-bit descriptors, because
even signed 64-bit integers provide vastly more address space than could
conceivably be used by any applications in the foreseeable future.
Presumably the sign of the 'Q' descriptors should be defined to be the same
as whatever is decided for the 'P' descriptors.

As a final note, to put this in historical perspective, the original FITS
binary table definition paper did not specify the sign of the descriptor
integers. It was only when the variable-length array convention was
approved by the FITS committees earlier this year that the wording was made
more rigorous to define the sign. The reason for choosing 'signed' rather
than 'unsigned' was mainly because at the time there did not exist any
software implementations that supported unsigned descriptor values.
Subsequently, some FITS libraries (e.g., CFITSIO) have been enhanced to
support unsigned descriptor values. If we make this change now, it will
reverse a decision that was only finally approved in April 2005. Also, it
will not invalidate any existing FITS files, because the positive, signed
descriptor values can always be treated as unsigned integers.

Bill Pence
--
__________________________________________________ __________________
Dr. William Pence
NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice)
Greenbelt MD 20771 +1-301-286-1684 (fax)

#2 June 15th 05, 11:41 PM

Hi Bill -

Unless someone can come up with a compelling reason why this causes a
technical problem I would support it (using unsigned for 32-bit pointers).
The 2 GB data size limit is getting to be a major problem which we have
to deal with. The real solution is 64-bit support, but a factor of 2 for
something like this makes a big difference. The only issue I can see is
that older programs not expecting unsigned would interpret such offsets
has having a negative value and probably reject the file. In the worst
case (software fails to check for a negative value) a pointer error could
occur and invalid data could be returned.

- Doug

On Wed, 15 Jun 2005, William Pence wrote:

This note concerns a relatively small technical issue in the larger proposal
to add 64-bit integer support to FITS:

At issue is whether to reverse the recent decision to define the 'P'
variable-length array descriptors in FITS binary tables to be a pair of
'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead.
The first integer gives the number of elements in the array, and the 2nd
integer gives the byte offset in the heap to the first element of the array.
The practical consequence of this change is that it will double the
allowed heap size from about 2.1 GB to about 4.2 GB.

This is not just a theoretical issue because there are existing applications
that can easily produce FITS files with binary table heaps larger than 2.1
GB (e.g., using the 'tiled' image compression convention where the
compressed rows of an image are stored in a variable length array table
column). Allowing this extra factor of 2 in size will benefit software
applications that would otherwise need to be rewritten to use the proposed
'Q' 64-bit descriptors (assuming that the 'Q' type is eventually approved by
the FITS committees). There are no technical reasons not to support
unsigned descriptor values (e.g., it is impossible to have negative
descriptors). Forcing the descriptors to be signed 32-bit integers
artificially cuts in half the potential size of the heap.

The main argument for keeping the descriptors as signed integers is that
FITS has never supported unsigned integers as a raw data type (although it
does support unsigned integers by applying an offset to the FITS signed
integer values). Thus, it is argued, the definition of FITS remains more
'pure' if we don't introduce unsigned integers in this case. There is
however a real distinction between the array descriptor values and the other
FITS table column data types because the descriptor values themselves are
almost never directly accessible at the application software level.
Instead, the descriptor values are only used by the low-level FITS interface
software routines, when accessing the arrays that the descriptor points to.

I don't consider this to be a major issue, but given a choice, I think the
practical advantages of doubling the allowed size of the heap out weighs the
more intangible 'purity of FITS' issue.

How do others feel about this issue? Is there a clear consensus one way or
the other? Should the FITS committees be explicitly asked to vote on a
preference?

This issue does not affect the proposed 'Q' 64-bit descriptors, because
even signed 64-bit integers provide vastly more address space than could
conceivably be used by any applications in the foreseeable future.
Presumably the sign of the 'Q' descriptors should be defined to be the same
as whatever is decided for the 'P' descriptors.

As a final note, to put this in historical perspective, the original FITS
binary table definition paper did not specify the sign of the descriptor
integers. It was only when the variable-length array convention was
approved by the FITS committees earlier this year that the wording was made
more rigorous to define the sign. The reason for choosing 'signed' rather
than 'unsigned' was mainly because at the time there did not exist any
software implementations that supported unsigned descriptor values.
Subsequently, some FITS libraries (e.g., CFITSIO) have been enhanced to
support unsigned descriptor values. If we make this change now, it will
reverse a decision that was only finally approved in April 2005. Also, it
will not invalidate any existing FITS files, because the positive, signed
descriptor values can always be treated as unsigned integers.

Bill Pence

#3 June 16th 05, 02:56 AM

On Wed 2005/06/15 18:21:27 -0400, William Pence wrote
in a message to: FITSBITS

How do others feel about this issue? Is there a clear consensus one way or
the other? Should the FITS committees be explicitly asked to vote on a
preference?

Currently P-descriptors are effectively only 31-bit, so it should be
possible to extend them to 32-bits and beyond in whatever way seems
best. I favour using unsigned ints since it matches the data type to
the intended usage and provides a clean progression from 32- to 64-bit
descriptors.

Existing software that can only handle 31-bit descriptors won't
automatically understand the extended syntax. However, the requirement
for backwards compatibility, "once FITS always FITS", refers to the
data, not the software.

Mark Calabretta
ATNF

#4 June 16th 05, 02:20 PM

I'm in favor of the change from signed to unsigned 32-bit integer. As
described,
there is nothing "signed" about the number of array elements or byte
offsets.
Better to make such a change sooner than later!
Arne

#5 June 16th 05, 03:17 PM

On Wed, 15 Jun 2005, William Pence wrote:

At issue is whether to reverse the recent decision to define the 'P'
variable-length array descriptors in FITS binary tables to be a pair of
'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead.

The practical consequence of this change is that it will double the allowed
heap size from about 2.1 GB to about 4.2 GB.

the same result (going beyond, actually WELL beyond, 2 GB) could be
achieved with the new Q type pointers. For the Q type pointers being
signed or unsigned really does not matter (a factor of 2 on "nearly
infinity" :-) ).

The main argument for keeping the descriptors as signed integers is that FITS
has never supported unsigned integers as a raw data type (although it does

That is a valid "elegance" argument, I would say as valid as the fact
that data and pointers are two different things (the two cancel out
reciprocally).

How do others feel about this issue? Is there a clear consensus one way or
the other? Should the FITS committees be explicitly asked to vote on a
preference?

I suggest the issue is voted together with the Q descriptors, since it
requires a change to the recently approved standard, and there is no
sense in having P signed and Q unsigned.

So either we vote

(A) to have Q signed (requiring no change to the April vote), or we vote
(B) to introduce Q unsigned, and at the same time make P unsigned.

I'm not sure of the best way. If the matter is sorted out by preliminary
discussion, than a "traditional" vote is called on either proposal (A)
or proposal (B). Do the formal voting rules allow to vote on a
non-binary alternative ( YES A, YES B or NO instead of YES NO) ?

My concerns with the unsigned issue are two :

- one is the Once FITS Always FITS ... but this is probably weak,
all files produced before April were "not official" so there won't
be many produced afterwards. And anyhow all the "signed" one, if
not negative (as they should not be) remain legal.

However some s/w has to be changed (if there is any which is affected
they should speak now or never !)

- the other one is that unsigned may not be supported by all
programming languages (I'm specifically thinking of Fortran), which
somehow imbeds a language preference in FITS. It is however true
than one can call a library routine supporting unsigned in another
language. And this will apply only to case of "large heaps" ...
... after all a program written for a specific purpose can
legitimately detect a particular FITS feature and decide not to
support it (if not required in specific context). Such a program
could transparently support descriptors until the n-1 bit limit,
and signal "unsupported" if the descriptor "goes negative" for it.

Lucio Chiappetti

--
----------------------------------------------------------------------
is a newsreading account used by more persons to
avoid unwanted spam. Any mail returning to this address will be rejected.
Users can disclose their e-mail address in the article if they wish so.

#6 June 16th 05, 03:36 PM

On Wed, Jun 15, 2005 at 06:21:27PM -0400, William Pence wrote:

At issue is whether to reverse the recent decision to define the 'P'
variable-length array descriptors in FITS binary tables to be a pair of
'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead.
The first integer gives the number of elements in the array, and the 2nd
integer gives the byte offset in the heap to the first element of the
array. The practical consequence of this change is that it will double the
allowed heap size from about 2.1 GB to about 4.2 GB.

I would recommend *against* this change. It only gains you one bit,
and does have potential implementation issues. It's not worth it. I
routinely run into 2GB limits, and about 2/3 as often I run into 4GB
limits. The solution is the Q descriptors, which almost everybody agrees
are necessary.

descriptors). Forcing the descriptors to be signed 32-bit integers
artificially cuts in half the potential size of the heap.

I argue that a factor of only two is too small to be of general use.
--
Mike Nolan +1 787 878 2612 Fax: +1 787 878 1861
Arecibo Observatory, HC3 Box 93995, Arecibo, Puerto Rico 00612

#7 June 16th 05, 05:02 PM

On Thu, 16 Jun 2005, LC's No-Spam Newsreading account wrote:

I suggest the issue is voted together with the Q descriptors, since it
requires a change to the recently approved standard, and there is no
sense in having P signed and Q unsigned.

I agree that it makes no sense, and also lacks elegance and consistency,
but these qualities have never played much part in the design of FITS, and
I wonder if it isn't a bit late to start now :-)

- the other one is that unsigned may not be supported by all
programming languages (I'm specifically thinking of Fortran), which

Well I'm a dyed-in-the-wool Fortran programmer, but I don't think this
affects the argument at all. Fortran code is likely to depend upon a
library such as FITSIO to do its dirty work, and even if people insist on
writing Fortran to read FITS files directly, there are fairly easy
solutions to this problem.

My own interest in the 64-bit topic was sparked by finding FITS binary
tables that were likely to exceed 2 GB in size, and with the possibility
that they might have tables of over 2 billion rows. This would mean
NAXIS2 in the header would have an integer constant above the limit for a
32-bit integer. As far as I can see there is nothing in the FITS Standard
to affect this, it's only specific implementations that might be lacking.

The only thing that's releant here is that it might be a good idea for the
revised Standard to include a note pointing out that integers values in
headers might be larger than a 32-bit integer can handle.

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH, U.K.

#8 June 16th 05, 05:32 PM

Yes for the 64-bit in BITPIX / K / Q, but I would stronly suggest to
*NOT* expand the pointer/length from 31 to 32 bits in the P convention:
first the gain is minimal, and second the limit on file size of 32-bit
Unix machines used to be 2Gb, and not 4Gb -- the "largefile" Unix
extension was a move beyond 2Gb, and not beyond 4Gb (the 'G' here
means 1024^3)

There was also some suggestion about having a versioning mechanism in
FITS -- and it would be a great benefit, I think, to know right from the
beginning of a FITS input stream that it has (or may have), far beyond the
beginning of the file, some extensions which may NOT be recognized by an
old software. I feel this modification would be a good opportunity to
include this versioning mechanism in FITS.

--Francois.
================================================== ==============================
Francois Ochsenbein ------ Observatoire Astronomique de Strasbourg
11, rue de l'Universite F-67000 STRASBOURG Phone: +33-(0)390 24 24 29
Email: (France) Fax: +33-(0)390 24 24 17
================================================== ==============================

#9 June 16th 05, 10:52 PM

I am in favor of signed integers and of the 64-bit proposal.

Eric

#10 June 17th 05, 09:27 AM

On Thursday 16 June 2005 00:21, William Pence wrote:
At issue is whether to reverse the recent decision to define the 'P'
variable-length array descriptors in FITS binary tables to be a pair of
'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead.
I'm sorry but we cannot 'reverse the recent decision'. That would violate
the section 9 of the FITS Standard 'Restrictions on Changes'

Technically there is no good argument for 'unsigned 32-bit integers'
as we are discussing 64-bit pointers. To keep the standard more
symmetric I would argue for signed 64-bit integers. When we will
need the last bit we should start the discussions on 128-bit integer.

In general, I agree on the need for 64-bit pointers and integer columns
(both signed) in binary tables based on the need for large heaps and
time stamp.

I still have reservation concerning BITPIX=64 for the following reasons:
1) there seems no good physical reason for 64-bit integer images. The
number of photons from astronomical source hardly justifies it
especially considering their statistical distribution. Let someone
present a real, practical case and we should considere it.
2) The FITS standard is useful because the vast majority of systems
implements it - that is if one writes a conforming FITS file the
likelihood of reading it on any system is high.
Adding BITPIX=64 would require changes at the top level of all readers.
In order for this to actually be implemented people would have to feel
the need otherwise it remains empty words.

Preben Grosbol

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
FITS long integer support (was [fitsbits] ADASS FITS BoFon Sunday)	William Pence	FITS	6	October 22nd 04 08:23 PM
[fitsbits] FITS long integer support	Steve Allen	FITS	0	October 21st 04 06:22 PM
[fitsbits] Start of the FITS MIME type Public Comment Period	William Pence	FITS	8	June 17th 04 06:08 AM
[fitsbits] Happy Birthday, FITS!	Don Wells	FITS	0	March 28th 04 01:58 PM
Reading floating point FITS files	John Green	FITS	34	November 29th 03 12:31 AM