FITS long integer support (was [fitsbits] ADASS FITS BoF onSunday)

#1 October 20th 04, 03:29 PM

As an old FITS person, let me remind you that FITS is first for data
exchange. Since many operating systems and compilers do not support
64-bit integers (other than in sneaky hidden ways to read large
files), we should move extremely slowly to explicitly allow them in
FITS. I cannot code my machines to read or use these things, so you
cannot code your data in them if you wish my software (used lots of
places) to understand you.

I suppose I could convert a 64-bit image into double precision float
which would be inaccurate but usable. But an NAXISn or a pointer in a
heap table - those must be accurate or they do not work at all.

FITS has always been practical rather than "modern" - let's keep it
that way.

Eric Greisen

#2 October 21st 04, 09:00 AM

On Wed, 20 Oct 2004, Thomas McGlynn wrote:

I don't think there is any implication that header keywords are limited
to what is permitted in 4 byte integers, i.e.,

Tom is quite right, and I was mis-reading or mis-remembering the Standard.
So I guess the only question is whether there is interest in getting
current software packages and libraries to be compatible with long files.

I take note of Eric's comments that FITS users should avoid generating
files over 2 GB in size because not all current systems can handle them.
In the long term, however, surely the 2 GB file size limit will be seen in
the same light as the comment ascribed to Bill Gates on the MS-DOS memory
limit: "surely 640k is enough for anyone". I'm old enough to remember the
painful transition from 16-bit to 32-bit machines, so can't help feeling
that a bit of advanced planning would ease the transition to 64-bit
addressing that is surely inevitable.

There are also three changes to the FITS standard that would be needed
to accommodate long integers.

BITPIX = 64

would indicate arrays of 8 byte integers in images.

I don't think we need these yet in high-energy astronomy, perhaps
optical/IR astronomers would comment on whether they are needed?

TFORMxx = 'K'

would indicate arrays of 8 byte integers in tables.

I think that 8-byte integers are starting to appear, e.g. as pixel-code
numbers for pixelations of the sky with resolution below around 30
arc-seconds, so that seems a desirable feature.

TFORMxx = 'Q'

would indicate use of longwords in pointers in variable length columns.

I don't know of any need for this yet, but if files over 2 GB become
common surely the pointers will have to move to more than 4-bytes?

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH, U.K.

#3 October 21st 04, 05:00 PM

I did not suggest that we need to avoid files 2 GByte in size, just
constructs like 64-bit integers which are simply not supported on many
computers in any easily accesible way. Thus tables with heaps should
stay 2 GBytes, but we ship 10 GBytes visibility data sets a fair
amount.

Eric Greisen

#4 October 22nd 04, 12:12 PM

On Wednesday 20 October 2004 16:29, Eric Greisen wrote:
Since many operating systems and compilers do not support
64-bit integers (other than in sneaky hidden ways to read large
files), we should move extremely slowly to explicitly allow them in
FITS.
I can fully support this view. The only good argument for 64-bit
integers is pointers as uncertainties in physical quantities hardly
can justify such accuracy. So the issue if pointers to the HEAP
or reference columns to rows in tables with more than 2G rows
are important currently. I would prefer to wait until 64-bit machines
are the default in our community.

Preben Grosbol

#5 October 22nd 04, 03:05 PM

Preben Grosbol wrote:

I would prefer to wait until 64-bit machines
are the default in our community.

Preben Grosbol

referring to this part of the argument alone, I think it will happen
a lot sooner that it takes to implement a FITS agreement. Remember
how long PDPs and NOVAs lasted once you'd seen a VAX...
Suns have been all 64-bit for some time, Opterons are here and gathering
pace, etc etc.

Peter.

#6 October 22nd 04, 04:39 PM

Preben Grosbol wrote:

On Wednesday 20 October 2004 16:29, Eric Greisen wrote:

Since many operating systems and compilers do not support
64-bit integers (other than in sneaky hidden ways to read large
files), we should move extremely slowly to explicitly allow them in
FITS.

I can fully support this view. The only good argument for 64-bit
integers is pointers as uncertainties in physical quantities hardly
can justify such accuracy. So the issue if pointers to the HEAP
or reference columns to rows in tables with more than 2G rows
are important currently. I would prefer to wait until 64-bit machines
are the default in our community.

Preben Grosbol

While earlier discussion was not advocacy, let me discuss
where my views lie...

Eric and Preben have suggested that both the need for and the support
for 8-byte integers is sufficiently rare that it would be inappropriate
to consider revising the standard to support them.

I don't agree with either point. Support for eight byte integers
is widespread within machines used today. Most current C, Fortran and all Java
compilers support eight byte integers. IDL has supported 8-byte integers
for several years. There are doubtless many machines/compilers extent which
do not support 8-byte integers but there are many machines which still do
not support files longer than 2 GB. Nonetheless such files are usefully
produced as FITS. [By the by, it might be argued that Fortran has no
'standard' way to describe integers of 8 bytes. Of course it also has no
standard way to describe integers of 2 bytes (or for that matter a completely
standard way to describe integers of 4 bytes). However most Fortrans
that I have seen have a 'kind' corresponding to 8 byte integers.]

With regard to usage... I personally don't seen any immediate need in the
community for images with eight-byte integer depth, however usage of
eight byte integers in tables seems very desirable. E.g., consider an
X-ray mission detecting photons with a microsecond resolution clock.
A 4-byte integer will overflow in less than an hour. When housekeeping data is stored
in 8-byte longs that should be the natural way to store it. If we are counting
photons in an image, the total number of photons can easily exceed the 4-byte limit.
There are now lots of places out there where our measuring devices count beyond the billions.

Current catalogs of images are already at or passing the 2 GB limit for positive 4-byte integers.
If we wish to create FITS representations of new catalogs (or subsets of them)
we are going to find it difficult to fit the indices in 4-byte integers, while 8-bytes
will suffice for the foreseeable future.

But the most compelling need for 8 byte integers with FITS may be to support
variable length arrays. Multi-gigabyte files are now commonplace in astronomy.
Use of variable length arrays could allow us to index information in these large
files but this cannot be done since the offsets will very quickly surpass the 4-byte
limits. Within a few years 100 GB files are going to be normal and if we
wish the variable length records extension to be viable it needs to be able to
accommodate data on such scales.

Finally, a bit of philosophy...

As Eric noted FITS originated as an interchange format, but that is not all it
is today, nor should that be the only usage that should drive its evolution.
FITS today is used as a data format in many software packages. FITS is also
the standard archival format for most astronomy data. When we look at FITS
and decide whether or not to extend it, recognize that when we limit FITS
we may make other formats, e.g., HDF, more appealing to those who need the
capabilities being proscribed.

But what about those who can't read the new formats? I don't think they
will be as numerous as some seem to be suggesting. Many of the major libraries
already support 8-byte integers on an experimental basis. So those who
use CFITSIO need change nothing in their code. They already can do most
of this! Nor will existing files, or existing data streams suddenly adopt 8-byte
integers en masse. No existing standard FITS file will be made invalid. What will
happen is that people will gradually recognize the they no longer need to use the
subterfuges and workarounds to stay within the legal FITS boundaries and eight-byte
integers will emerge where they are most needed.

Regards,
Tom McGlynn

#7 October 22nd 04, 07:14 PM

I'm less clear about long integer support in Fortran. Fortran 90/95 I
believe does support this, but I don't recall seeing support for integer*8
in Fortran 77 (it is certainly not part of ANSI standard Fortran-77). So
this may boil down to a language divide: C/C++, Java, Fortran-90, and
probably most other new languages naturally support long integers, but
Fortran-77 doesn't.

indeed, there is no official support, since integer*8 isn't in the standard.
However, both the intel and gnu compiler support it, and I abuse this feature
(with caution). I also recall the Cray compiler used to have a flag to the
compiler that made floats become double's essentially, so something in this
direction may be implemented by compiler writers.

- peter

#8 October 22nd 04, 07:21 PM

Peter Teuben writes:
I'm less clear about long integer support in Fortran. Fortran 90/95 I
believe does support this, but I don't recall seeing support for integer*8
in Fortran 77 (it is certainly not part of ANSI standard Fortran-77). So
this may boil down to a language divide: C/C++, Java, Fortran-90, and
probably most other new languages naturally support long integers, but
Fortran-77 doesn't.

indeed, there is no official support, since integer*8 isn't in the standard.
However, both the intel and gnu compiler support it, and I abuse this feature
(with caution). I also recall the Cray compiler used to have a flag to the
compiler that made floats become double's essentially, so something in this
direction may be implemented by compiler writers.

Fortran 77 defines INTEGER, LOGICAL, and REAL to be all of the same
length and does not define that length. Some implementations do allow
one to declare all of them to 8 bytes, but usually that would only be on
64-bit computers. DOUBLE PRECISION is twice as long.

ERic Greisen

#9 October 23rd 04, 11:52 AM

On Fri, 22 Oct 2004, Thomas McGlynn wrote:

The notations integer*2, integer*4 and integer*8 are all non-standard
Fortran and are not included in any of the Fortran standards F66, F77, F90, F95 or
the impending F2003. Integer*2 has never been standard Fortran.

Correct. In practice, however, their use is so widespread that
compiler-writers have been forced to support them. I have used quite a
wide range of Fortran compilers and never in recent years come across any
which don't support all these. The open source compilers g77 and g95 both
support all these, despite in other respects restricting themselves pretty
much to the respective official Fortran Standards. The g77 documentation
says that INTEGER*8 may not be fully supported, but in practice I haven't
found any problems.

Fortran (i.e., the standard) has no mechanism to specify the length in
bytes of the desired variable. The standard way to get different
kinds of integers is something like

integer (kind=n) i,j,k

That's only half the story, as Tom probably knows, as the mapping from the
kind-selector n to a number of bytes is intentionally unspecified by the
Fortran90/95/2003 standards, and in practice it varies. What you do
instead is select the number of *decimal* digits you need, so that, say

integer (kind=select_int_kind(12)) :: i, j, k

will force the compiler to give you storage capable of storing an integer
of up to 12 digits, which may in practice mean 8 bytes (or if that's
impossible is guaranteed to give you a compiler-time error). That doesn't
map very well to our image of storage as always an integer number of
bytes, but then Fortran was first standardised in the era when we, at
least, were using 12-bit and 60-bit computers, both of which had Fortran
compilers, and neither had any notion of bytes.

Now that byte-based storage is ubiquitous, these extreme portability
measures in Fortran seem a bit superfluous. But since FITS goes back to
the same vintage, perhaps we shouldn't criticise.

--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH, U.K.

#10 October 25th 04, 08:49 AM

On Fri, 22 Oct 2004, William Pence wrote:

this may boil down to a language divide: C/C++, Java, Fortran-90, and
probably most other new languages naturally support long integers, but
Fortran-77 doesn't.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[fitsbits] Start of the FITS MIME type Public Comment Period	William Pence	FITS	8	June 17th 04 06:08 AM
[fitsbits] problems with fits readers	Eric Greisen	FITS	0	June 4th 04 08:15 PM
[fitsbits] Happy Birthday, FITS!	Don Wells	FITS	0	March 28th 04 01:58 PM
Reading floating point FITS files	John Green	FITS	34	November 29th 03 12:31 AM
[fitsbits] BLANK keyword misinterpretation	Steve Allen	FITS	4	November 21st 03 04:42 PM