[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:	Digital Fortran
Notice:	Read notes 1.* for important information
Moderator:	QUARK::LIONEL

Created:	Thu Jun 01 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1333
Total number of notes:	6734

1251.0. **"Real*16 Performance?"** by RHETT::HALETKY () Mon Apr 07 1997 13:58

    Hello,
    
    We have a custoemr with questions about the performance of real*16 and
    if there is anyway to speed them up?
    
    
    Best regards,\
    Ed Haletky
    Digital CSC

T.R	Title	User	Personal Name	Date	Lines
1251.1		QUARK::LIONEL	Free advice is worth every cent	`Mon Apr 07 1997 14:08`	8
	What are the questions? What platform are we talking about? On most VAX and all Alpha systems, REAL*16 is software-emulated. The Alpha support is through specially optimized routines and is really quite good, all things considered. There's no way to make it faster. The VAX support is through instruction emulation and it's not speedy. Steve
1251.2	Benchmarks for real*16?	RHETT::HALETKY		`Tue Apr 08 1997 13:55`	8
	The system is an ALPHA. The custoemr claims tha the emulation is much much slower than real*8. Are there any benchmarks? Best regards, Ed Haletky Digital CSC
1251.3		QUARK::LIONEL	Free advice is worth every cent	`Tue Apr 08 1997 14:10`	5
	Yes, it is much slower than REAL8, since the latter is done in hardware. No, we don't have benchmarks - we never claimed REAL16 was fast. It is really very fast for a software implementation. Steve
1251.4	What would they consider to be fast?	WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Tue Apr 08 1997 16:04`	1
	Are they comparing it to a competitor's REAL*16? Which one?
1251.5		TLE::EKLUND	Always smiling on the inside!	`Tue Apr 08 1997 16:07`	15
	I think it's fair to say that the market for this has never been large enough to really justify a full hardware implementation, H_Floating notwithstanding. As software implementations go, this appears to be a particularly fast one, but of course it can never compete with real8 or any other hardware-implemented mode. My own view on this is that if speed becomes a serious problem for the customer, their time will be spent well in figuring out how to use real8 instead of real16 - not always possible, but often a sloppy real16 algorithm might be replaced with a clever real*8 one... Cheers! Dave Eklund
1251.6	Alpha REAL*16 is very fast	GEMEVN::GROVE		`Tue Apr 08 1997 19:17`	224
	Reaching back into a predecessor FORTRAN notes files, I found the following notes about our REAL16 implementation on Alpha. cw hobbs reports that some very demanding CERN users were amazed at the speed. /Rich Grove PS: When you look at the absolute timings, notice that the dates are 1994 so most of this stuff is on 200Mz EV4. It should be considerably faster on your 500Mhz or 622Mhz EV56. Dwight Manley's note discusses the merits of an "E-float" implementation, which is waht the IBM RS6000 does. Note that X-float compares very favorably to the semi-hardware E-float. <<< TURRIS::DISK$NOTES_PACK:[NOTES$LIBRARY]DEC_FORTRAN_ALPHA.NOTE;1 >>> -< DEC Fortran on ALPHA >- ================================================================================ Note 1693.0 REAL16 - why is it so fast? 2 replies TLE::WHITLOCK "Stan Whitlock" 17 lines 1-AUG-1994 11:09 -------------------------------------------------------------------------------- ================================================================================ Note 1689.4 Problem with extrnal function declaration 4 of 5 CERN::HOBBS "Budweiser - official embarrassment of " 10 lines 1-AUG-1994 08:04 -< REAL16 - why is it so fast? >- -------------------------------------------------------------------------------- I just got stopped by someone at CERN who wanted to comment about how fast the REAL16 support is in VMS 6.1. Apparently, he had to double-check his figures because it seemed too good to be true. Is there any information available to explain what tricks are pulled in the RTL to support X-float? I'd like to toot our own horn a bit here ;-) -cw ================================================================================ WIBBIN::NOYCE "DEC 21064-200DX5 : 138 SPECint @ $36K" 5 lines 1-AUG-1994 09:56 -------------------------------------------------------------------------------- Well, having 64-bit integer registers and 64-bit integer arithmetic helps a lot. The routines for the fundamental operations add, sub, mul, div are carefully hand-coded, with a great deal of overlap, especially in the multiply routine. The routines receive arguments and return results in registers, not memory. ================================================================================ GEMGRP::GROVE 7 lines 5-AUG-1994 15:58 -------------------------------------------------------------------------------- See note 1011, esp 1011.6-1011.8 The RTL primitives that Steve Root wrote are really outstanding, and the compiled code does a good job on register allocation and very lightweight linkages to the RTL routines. /Rich ================================================================================ Note 1011.3 REAL16 White Paper 3 of 10 NICCTR::MANLEY 628 lines 4-AUG-1993 14:00 -< Enhanced Precision vs Extended Precision >- -------------------------------------------------------------------------------- I encourage the DEC-FORTRAN group to seriously consider supporting both IEEE extended precision floating point (this is being done) and IBM RS/6000 enhanced precision floating point (this is not being done). Perhaps this could be provided by KAP-FORTRAN. Kuck and Associates, Inc. has provided exactly this sort of REAL16 support for other hardware vendors in the past. Now let me explain the motivation for my proposal. Clearly, IEEE 754 extended precision floating point support is very important. Our customers expect it and HP provides it. We must support it to meet customer needs and to compete with HP. However, IEEE extended precision support carries a large performance penalty. Extended precision floating point will not be competitive with IBM's enhanced precision floating point. To compete with IBM, we must also support enhanced precision floating point. Enhanced precision floating point operands are represented by pairs of double precision floating point operands. Arithmetic operations are carried out using double precision floating point instructions. Either IEEE or VAX G_FLOAT format double precision operands may be used to represent enhanced floating point operands. The exponent field size of an enhanced floating point operand is identical to that of a double precision floating point operand. The effective significand field of enhanced precision floating point is nearly twice that of double precision floating point. Thus, enhanced precision floating point is much more accurate than double precision floating point. Enhanced precision floating point is not IEEE 754 compliant. Extended precision floating point, on the other hand, is IEEE 754 compliant. With extended precision floating point, both exponent and significand field sizes are larger than they are for enhanced precision floating point. Thus accuracy is improved and range is extended. Unfortunately, there is very little support in the Alpha architecture to aid in making software emulation of IEEE extended precision perform well. We need an alternative. The remainder of the note contains four subroutines and two programs. They support enhanced precision floating point arithmetic. The subroutines perform enhanced floating point add, subtract, multiply, and divide arithmetic operations. The first program tests the accuracy of enhanced precision floating point arithmetic relative to H_FLOAT arithmetic. Run it to see that most folks don't need H_FLOAT for accuracy! A second program times enhanced floating point arithmetic operations. Run it to compare enhanced floating point performance to that of both H_FLOAT and G_FLOAT. - Dwight - ================================================================================ Note 1011.6 REAL16 White Paper 6 of 10 HPCGRP::MANLEY 50 lines 26-APR-1994 19:58 -< REAL16 - Nice Job! >- -------------------------------------------------------------------------------- Re: .4,.5 I just completed a quick performance sanity test comparing the REAL16 and unpipelined REAL8 basic operations (Add, Subtract, Multiply, and Divide). I also compared the REAL16 operations to a FORTRAN implementation using pairs of floating point values ala IBM. On our VAX 6000, the same FORTRAN code beats the pants off ALL H_Floating operations. I expected the worst! Guess what, I'm pleasantly surprised! The REAL16 primitives on Alpha have about the same performance for Add, Subtract, and Multiply as the FORTRAN routines. The REAL16 divide operation is about 3 time faster than the Newton-Raphson reciprocal (quick and dirty) approach. The performance results follow (E_Float uses two 64 bit floats): Timing --- 1000000 E-Precision Arithmetic Operations Time for E_Float Add Operations 0.5397949 secs. Time for X_Float Add Operations 0.8000488 secs. Time for G_Float Add Operations 4.9804688E-02 secs. Time for E_Float Sub Operations 0.5798340 secs. Time for X_Float Sub Operations 0.6699219 secs. Time for G_Float Sub Operations 5.0048828E-02 secs. Time for E_Float Mul Operations 0.9299316 secs. Time for X_Float Mul Operations 0.8598633 secs. Time for G_Float Mul Operations 6.0058594E-02 secs. Time for E_Float Div Operations 7.009766 secs. Time for X_Float Div Operations 2.300049 secs. Time for G_Float Div Operations 0.4399414 secs. You've done a nice job, especially on the divide operation. (I heard a rumor that there's some "scrabble magic" imbedded in that code.) - Dwight - ================================================================================ Note 1011.7 REAL16 White Paper 7 of 10 GEMGRP::GROVE 12 lines 27-APR-1994 08:35 -< Roll the credits! >- -------------------------------------------------------------------------------- Dwight, thanks for the measurements and the posting. Steve Root (from the VSSAD group in Hudson) and Lucy Hamnett (GEM) did a great job on the REAL16 implementation: Steve did the high-performance arithmetic primitives Lucy was project leader for the whole effort, and designed and implemented the GEM compiler support for X-float This is a really nice piece of work! Rich Grove ================================================================================ Note 1011.8 REAL16 White Paper 8 of 10 AD::ROOT 36 lines 3-MAY-1994 23:56 -< updated perf. data >- -------------------------------------------------------------------------------- UPDATED PERFORMANCE DATA I made some changes to Dwight's benchmark to reduce unnecessary overhead. The results for 10^7 operations are given below. The times make sense, except that the X_Float div to mul ratio seems high, and E_Float Mul became more expensive. (Note, that the G_Float operations except for div, 'should' check in at .4 seconds.) These numbers for X_Float and E_Float are slightly optimistic, compared to what a customer may achieve in a real job, in that I_Cache traffic is minimized. The E_Float numbers are slightly optimistic in that they incur no call overhead. The X_Float add/sub numbers are slightly optimistic in that exponent mispredicts are probably a little low compared to a real job. The X_Float sub numbers are probably a little low in that fewer complete normalizations are incurred. Time for E_Float Add Operations 4.750000 secs. Time for X_Float Add Operations 4.260000 secs. Time for G_Float Add Operations 0.3999996 secs. Time for E_Float Sub Operations 4.780000 secs. Time for X_Float Sub Operations 4.380001 secs. Time for G_Float Sub Operations 0.4099998 secs. Time for E_Float Mul Operations 12.63000 secs. Time for X_Float Mul Operations 7.429998 secs. Time for G_Float Mul Operations 0.4000015 secs. Time for E_Float Div Operations 69.91000 secs. Time for X_Float Div Operations 21.80000 secs. Time for G_Float Div Operations 4.240005 secs. (Code available upon request.)
1251.7	real*16 mod	RHETT::HALETKY		`Wed Apr 09 1997 12:12`	42
	Hello, Cusotmer suggests the following: Regarding the 16-byte mod, I came up with a bitwise multiply- and-mod code last night which works, have enclosed it below (one could also use similar for modding the product of two standard 4-byte integers, but this is less crucial since the Alpha has the 8-byte integer type). A fast machine-code implementation of similar would be great; it might even be considered as an extension to the F90 intrinsic function library on the DEC F90 compiler: program bitwise_mod !...performs a bitwise-multiply-and-mod on 3 integers x,y,z ! to obtain xy mod z without risking integer overflow. integer8 :: x,y,z,sum logical :: flag print,'enter x,y,z' read,x,y,z if(x>z)x=mod(x,z) if(y>z)y=mod(y,z) ! this next line is for comparison only, and only works if xy ! CAN be stored in in 8-byte integer... print,'exact mod =',mod(xy,z) sum=0 if(btest(y,0))sum=x do y=y/2 ! could also use y=ishft(y,-1) here, if it's faster if(y==0)exit flag=btest(y,0) x=ishft(x,1) if(x>z)x=x-z if(flag)then sum=sum+x if(sum>z)sum=sum-z endif enddo print,'x*y mod z =',sum end program bitwise_mod
1251.8		COMEUP::SIMMONDS	loose canon	`Thu Apr 10 1997 00:19`	7
	Re: .7 \| -< real16 mod >- [...] \| integer8 :: x,y,z,sum Huh?
1251.9		QUARK::LIONEL	Free advice is worth every cent	`Thu Apr 10 1997 12:17`	6
	I think the customer was using real16 as a way of manipulating 64-bit integers without worrying about overflow. real8 won't cut it. If this is the case, then specialized routines for mod or whatever they want are the more appropriate solution. Steve

Conference turris::fortran

1251.0. "Real*16 Performance?" by RHETT::HALETKY () Mon Apr 07 1997 13:58

1251.0. **"Real*16 Performance?"** by RHETT::HALETKY () Mon Apr 07 1997 13:58