|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
FITS long integer support (was [fitsbits] ADASS FITS BoF onSunday)
As an old FITS person, let me remind you that FITS is first for data
exchange. Since many operating systems and compilers do not support 64-bit integers (other than in sneaky hidden ways to read large files), we should move extremely slowly to explicitly allow them in FITS. I cannot code my machines to read or use these things, so you cannot code your data in them if you wish my software (used lots of places) to understand you. I suppose I could convert a 64-bit image into double precision float which would be inaccurate but usable. But an NAXISn or a pointer in a heap table - those must be accurate or they do not work at all. FITS has always been practical rather than "modern" - let's keep it that way. Eric Greisen |
#2
|
|||
|
|||
On Wed, 20 Oct 2004, Thomas McGlynn wrote:
I don't think there is any implication that header keywords are limited to what is permitted in 4 byte integers, i.e., Tom is quite right, and I was mis-reading or mis-remembering the Standard. So I guess the only question is whether there is interest in getting current software packages and libraries to be compatible with long files. I take note of Eric's comments that FITS users should avoid generating files over 2 GB in size because not all current systems can handle them. In the long term, however, surely the 2 GB file size limit will be seen in the same light as the comment ascribed to Bill Gates on the MS-DOS memory limit: "surely 640k is enough for anyone". I'm old enough to remember the painful transition from 16-bit to 32-bit machines, so can't help feeling that a bit of advanced planning would ease the transition to 64-bit addressing that is surely inevitable. There are also three changes to the FITS standard that would be needed to accommodate long integers. BITPIX = 64 would indicate arrays of 8 byte integers in images. I don't think we need these yet in high-energy astronomy, perhaps optical/IR astronomers would comment on whether they are needed? TFORMxx = 'K' would indicate arrays of 8 byte integers in tables. I think that 8-byte integers are starting to appear, e.g. as pixel-code numbers for pixelations of the sky with resolution below around 30 arc-seconds, so that seems a desirable feature. TFORMxx = 'Q' would indicate use of longwords in pointers in variable length columns. I don't know of any need for this yet, but if files over 2 GB become common surely the pointers will have to move to more than 4-bytes? -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. |
#3
|
|||
|
|||
I did not suggest that we need to avoid files 2 GByte in size, just
constructs like 64-bit integers which are simply not supported on many computers in any easily accesible way. Thus tables with heaps should stay 2 GBytes, but we ship 10 GBytes visibility data sets a fair amount. Eric Greisen |
#4
|
|||
|
|||
On Wednesday 20 October 2004 16:29, Eric Greisen wrote:
Since many operating systems and compilers do not support 64-bit integers (other than in sneaky hidden ways to read large files), we should move extremely slowly to explicitly allow them in FITS. I can fully support this view. The only good argument for 64-bit integers is pointers as uncertainties in physical quantities hardly can justify such accuracy. So the issue if pointers to the HEAP or reference columns to rows in tables with more than 2G rows are important currently. I would prefer to wait until 64-bit machines are the default in our community. Preben Grosbol |
#5
|
|||
|
|||
Preben Grosbol wrote:
I would prefer to wait until 64-bit machines are the default in our community. Preben Grosbol referring to this part of the argument alone, I think it will happen a lot sooner that it takes to implement a FITS agreement. Remember how long PDPs and NOVAs lasted once you'd seen a VAX... Suns have been all 64-bit for some time, Opterons are here and gathering pace, etc etc. Peter. |
#6
|
|||
|
|||
Preben Grosbol wrote:
On Wednesday 20 October 2004 16:29, Eric Greisen wrote: Since many operating systems and compilers do not support 64-bit integers (other than in sneaky hidden ways to read large files), we should move extremely slowly to explicitly allow them in FITS. I can fully support this view. The only good argument for 64-bit integers is pointers as uncertainties in physical quantities hardly can justify such accuracy. So the issue if pointers to the HEAP or reference columns to rows in tables with more than 2G rows are important currently. I would prefer to wait until 64-bit machines are the default in our community. Preben Grosbol While earlier discussion was not advocacy, let me discuss where my views lie... Eric and Preben have suggested that both the need for and the support for 8-byte integers is sufficiently rare that it would be inappropriate to consider revising the standard to support them. I don't agree with either point. Support for eight byte integers is widespread within machines used today. Most current C, Fortran and all Java compilers support eight byte integers. IDL has supported 8-byte integers for several years. There are doubtless many machines/compilers extent which do not support 8-byte integers but there are many machines which still do not support files longer than 2 GB. Nonetheless such files are usefully produced as FITS. [By the by, it might be argued that Fortran has no 'standard' way to describe integers of 8 bytes. Of course it also has no standard way to describe integers of 2 bytes (or for that matter a completely standard way to describe integers of 4 bytes). However most Fortrans that I have seen have a 'kind' corresponding to 8 byte integers.] With regard to usage... I personally don't seen any immediate need in the community for images with eight-byte integer depth, however usage of eight byte integers in tables seems very desirable. E.g., consider an X-ray mission detecting photons with a microsecond resolution clock. A 4-byte integer will overflow in less than an hour. When housekeeping data is stored in 8-byte longs that should be the natural way to store it. If we are counting photons in an image, the total number of photons can easily exceed the 4-byte limit. There are now lots of places out there where our measuring devices count beyond the billions. Current catalogs of images are already at or passing the 2 GB limit for positive 4-byte integers. If we wish to create FITS representations of new catalogs (or subsets of them) we are going to find it difficult to fit the indices in 4-byte integers, while 8-bytes will suffice for the foreseeable future. But the most compelling need for 8 byte integers with FITS may be to support variable length arrays. Multi-gigabyte files are now commonplace in astronomy. Use of variable length arrays could allow us to index information in these large files but this cannot be done since the offsets will very quickly surpass the 4-byte limits. Within a few years 100 GB files are going to be normal and if we wish the variable length records extension to be viable it needs to be able to accommodate data on such scales. Finally, a bit of philosophy... As Eric noted FITS originated as an interchange format, but that is not all it is today, nor should that be the only usage that should drive its evolution. FITS today is used as a data format in many software packages. FITS is also the standard archival format for most astronomy data. When we look at FITS and decide whether or not to extend it, recognize that when we limit FITS we may make other formats, e.g., HDF, more appealing to those who need the capabilities being proscribed. But what about those who can't read the new formats? I don't think they will be as numerous as some seem to be suggesting. Many of the major libraries already support 8-byte integers on an experimental basis. So those who use CFITSIO need change nothing in their code. They already can do most of this! Nor will existing files, or existing data streams suddenly adopt 8-byte integers en masse. No existing standard FITS file will be made invalid. What will happen is that people will gradually recognize the they no longer need to use the subterfuges and workarounds to stay within the legal FITS boundaries and eight-byte integers will emerge where they are most needed. Regards, Tom McGlynn |
#7
|
|||
|
|||
I'm less clear about long integer support in Fortran. Fortran 90/95 I
believe does support this, but I don't recall seeing support for integer*8 in Fortran 77 (it is certainly not part of ANSI standard Fortran-77). So this may boil down to a language divide: C/C++, Java, Fortran-90, and probably most other new languages naturally support long integers, but Fortran-77 doesn't. indeed, there is no official support, since integer*8 isn't in the standard. However, both the intel and gnu compiler support it, and I abuse this feature (with caution). I also recall the Cray compiler used to have a flag to the compiler that made floats become double's essentially, so something in this direction may be implemented by compiler writers. - peter |
#8
|
|||
|
|||
Peter Teuben writes:
I'm less clear about long integer support in Fortran. Fortran 90/95 I believe does support this, but I don't recall seeing support for integer*8 in Fortran 77 (it is certainly not part of ANSI standard Fortran-77). So this may boil down to a language divide: C/C++, Java, Fortran-90, and probably most other new languages naturally support long integers, but Fortran-77 doesn't. indeed, there is no official support, since integer*8 isn't in the standard. However, both the intel and gnu compiler support it, and I abuse this feature (with caution). I also recall the Cray compiler used to have a flag to the compiler that made floats become double's essentially, so something in this direction may be implemented by compiler writers. Fortran 77 defines INTEGER, LOGICAL, and REAL to be all of the same length and does not define that length. Some implementations do allow one to declare all of them to 8 bytes, but usually that would only be on 64-bit computers. DOUBLE PRECISION is twice as long. ERic Greisen |
#9
|
|||
|
|||
On Fri, 22 Oct 2004, Thomas McGlynn wrote:
The notations integer*2, integer*4 and integer*8 are all non-standard Fortran and are not included in any of the Fortran standards F66, F77, F90, F95 or the impending F2003. Integer*2 has never been standard Fortran. Correct. In practice, however, their use is so widespread that compiler-writers have been forced to support them. I have used quite a wide range of Fortran compilers and never in recent years come across any which don't support all these. The open source compilers g77 and g95 both support all these, despite in other respects restricting themselves pretty much to the respective official Fortran Standards. The g77 documentation says that INTEGER*8 may not be fully supported, but in practice I haven't found any problems. Fortran (i.e., the standard) has no mechanism to specify the length in bytes of the desired variable. The standard way to get different kinds of integers is something like integer (kind=n) i,j,k That's only half the story, as Tom probably knows, as the mapping from the kind-selector n to a number of bytes is intentionally unspecified by the Fortran90/95/2003 standards, and in practice it varies. What you do instead is select the number of *decimal* digits you need, so that, say integer (kind=select_int_kind(12)) :: i, j, k will force the compiler to give you storage capable of storing an integer of up to 12 digits, which may in practice mean 8 bytes (or if that's impossible is guaranteed to give you a compiler-time error). That doesn't map very well to our image of storage as always an integer number of bytes, but then Fortran was first standardised in the era when we, at least, were using 12-bit and 60-bit computers, both of which had Fortran compilers, and neither had any notion of bytes. Now that byte-based storage is ubiquitous, these extreme portability measures in Fortran seem a bit superfluous. But since FITS goes back to the same vintage, perhaps we shouldn't criticise. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. |
#10
|
|||
|
|||
On Fri, 22 Oct 2004, William Pence wrote:
this may boil down to a language divide: C/C++, Java, Fortran-90, and probably most other new languages naturally support long integers, but Fortran-77 doesn't. |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[fitsbits] Start of the FITS MIME type Public Comment Period | William Pence | FITS | 8 | June 17th 04 06:08 AM |
[fitsbits] problems with fits readers | Eric Greisen | FITS | 0 | June 4th 04 08:15 PM |
[fitsbits] Happy Birthday, FITS! | Don Wells | FITS | 0 | March 28th 04 01:58 PM |
Reading floating point FITS files | John Green | FITS | 34 | November 29th 03 12:31 AM |
[fitsbits] BLANK keyword misinterpretation | Steve Allen | FITS | 4 | November 21st 03 04:42 PM |