Home | History | Annotate | Line # | Download | only in ultrasparc1234
      1      1.1  mrg dnl  SPARC v9 64-bit mpn_mul_1 -- Multiply a limb vector with a limb and store
      2      1.1  mrg dnl  the result in a second limb vector.
      3      1.1  mrg 
      4  1.1.1.2  mrg dnl  Copyright 1998, 2000-2003 Free Software Foundation, Inc.
      5      1.1  mrg 
      6      1.1  mrg dnl  This file is part of the GNU MP Library.
      7  1.1.1.2  mrg dnl
      8      1.1  mrg dnl  The GNU MP Library is free software; you can redistribute it and/or modify
      9  1.1.1.2  mrg dnl  it under the terms of either:
     10  1.1.1.2  mrg dnl
     11  1.1.1.2  mrg dnl    * the GNU Lesser General Public License as published by the Free
     12  1.1.1.2  mrg dnl      Software Foundation; either version 3 of the License, or (at your
     13  1.1.1.2  mrg dnl      option) any later version.
     14  1.1.1.2  mrg dnl
     15  1.1.1.2  mrg dnl  or
     16  1.1.1.2  mrg dnl
     17  1.1.1.2  mrg dnl    * the GNU General Public License as published by the Free Software
     18  1.1.1.2  mrg dnl      Foundation; either version 2 of the License, or (at your option) any
     19  1.1.1.2  mrg dnl      later version.
     20  1.1.1.2  mrg dnl
     21  1.1.1.2  mrg dnl  or both in parallel, as here.
     22  1.1.1.2  mrg dnl
     23      1.1  mrg dnl  The GNU MP Library is distributed in the hope that it will be useful, but
     24      1.1  mrg dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
     25  1.1.1.2  mrg dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
     26  1.1.1.2  mrg dnl  for more details.
     27  1.1.1.2  mrg dnl
     28  1.1.1.2  mrg dnl  You should have received copies of the GNU General Public License and the
     29  1.1.1.2  mrg dnl  GNU Lesser General Public License along with the GNU MP Library.  If not,
     30  1.1.1.2  mrg dnl  see https://www.gnu.org/licenses/.
     31      1.1  mrg 
     32      1.1  mrg include(`../config.m4')
     33      1.1  mrg 
     34      1.1  mrg C		   cycles/limb
     35      1.1  mrg C UltraSPARC 1&2:     14
     36      1.1  mrg C UltraSPARC 3:	      18.5
     37      1.1  mrg 
     38      1.1  mrg C Algorithm: We use eight floating-point multiplies per limb product, with the
     39      1.1  mrg C invariant v operand split into four 16-bit pieces, and the s1 operand split
     40      1.1  mrg C into 32-bit pieces.  We sum pairs of 48-bit partial products using
     41      1.1  mrg C floating-point add, then convert the four 49-bit product-sums and transfer
     42      1.1  mrg C them to the integer unit.
     43      1.1  mrg 
     44      1.1  mrg C Possible optimizations:
     45      1.1  mrg C   1. Align the stack area where we transfer the four 49-bit product-sums
     46      1.1  mrg C      to a 32-byte boundary.  That would minimize the cache collision.
     47      1.1  mrg C      (UltraSPARC-1/2 use a direct-mapped cache.)  (Perhaps even better would
     48      1.1  mrg C      be to align the area to map to the area immediately before s1?)
     49      1.1  mrg C   2. Sum the 4 49-bit quantities using 32-bit operations, as in the
     50      1.1  mrg C      develop mpn_addmul_2.  This would save many integer instructions.
     51      1.1  mrg C   3. Unrolling.  Questionable if it is worth the code expansion, given that
     52      1.1  mrg C      it could only save 1 cycle/limb.
     53      1.1  mrg C   4. Specialize for particular v values.  If its upper 32 bits are zero, we
     54      1.1  mrg C      could save many operations, in the FPU (fmuld), but more so in the IEU
     55      1.1  mrg C      since we'll be summing 48-bit quantities, which might be simpler.
     56      1.1  mrg C   5. Ideally, we should schedule the f2/f3 and f4/f5 RAW further apart, and
     57      1.1  mrg C      the i00,i16,i32,i48 RAW less apart.  The latter apart-scheduling should
     58      1.1  mrg C      not be greater than needed for L2 cache latency, and also not so great
     59      1.1  mrg C      that i16 needs to be copied.
     60      1.1  mrg C   6. Avoid performing mem+fa+fm in the same cycle, at least not when we want
     61      1.1  mrg C      to get high IEU bandwidth.  (12 of the 14 cycles will be free for 2 IEU
     62      1.1  mrg C      ops.)
     63      1.1  mrg 
     64      1.1  mrg C Instruction classification (as per UltraSPARC-1/2 functional units):
     65      1.1  mrg C    8 FM
     66      1.1  mrg C   10 FA
     67      1.1  mrg C   11 MEM
     68      1.1  mrg C   9 ISHIFT + 10? IADDLOG
     69      1.1  mrg C    1 BRANCH
     70      1.1  mrg C   49 insns totally (plus three mov insns that should be optimized out)
     71      1.1  mrg 
     72      1.1  mrg C The loop executes 53 instructions in 14 cycles on UltraSPARC-1/2, i.e we
     73      1.1  mrg C sustain 3.79 instructions/cycle.
     74      1.1  mrg 
     75      1.1  mrg C INPUT PARAMETERS
     76      1.1  mrg C rp	i0
     77      1.1  mrg C up	i1
     78      1.1  mrg C n	i2
     79      1.1  mrg C v	i3
     80      1.1  mrg 
     81      1.1  mrg ASM_START()
     82      1.1  mrg 	REGISTER(%g2,#scratch)
     83      1.1  mrg 	REGISTER(%g3,#scratch)
     84      1.1  mrg 
     85      1.1  mrg define(`p00', `%f8') define(`p16',`%f10') define(`p32',`%f12') define(`p48',`%f14')
     86      1.1  mrg define(`r32',`%f16') define(`r48',`%f18') define(`r64',`%f20') define(`r80',`%f22')
     87      1.1  mrg define(`v00',`%f24') define(`v16',`%f26') define(`v32',`%f28') define(`v48',`%f30')
     88      1.1  mrg define(`u00',`%f32') define(`u32', `%f34')
     89      1.1  mrg define(`a00',`%f36') define(`a16',`%f38') define(`a32',`%f40') define(`a48',`%f42')
     90      1.1  mrg define(`cy',`%g1')
     91      1.1  mrg define(`rlimb',`%g3')
     92      1.1  mrg define(`i00',`%l0') define(`i16',`%l1') define(`i32',`%l2') define(`i48',`%l3')
     93      1.1  mrg define(`xffffffff',`%l7')
     94      1.1  mrg define(`xffff',`%o0')
     95      1.1  mrg 
     96      1.1  mrg PROLOGUE(mpn_mul_1)
     97      1.1  mrg 
     98      1.1  mrg C Initialization.  (1) Split v operand into four 16-bit chunks and store them
     99      1.1  mrg C as IEEE double in fp registers.  (2) Clear upper 32 bits of fp register pairs
    100      1.1  mrg C f2 and f4.  (3) Store masks in registers aliased to `xffff' and `xffffffff'.
    101      1.1  mrg 
    102      1.1  mrg 	save	%sp, -256, %sp
    103      1.1  mrg 	mov	-1, %g4
    104      1.1  mrg 	srlx	%g4, 48, xffff		C store mask in register `xffff'
    105      1.1  mrg 	and	%i3, xffff, %g2
    106      1.1  mrg 	stx	%g2, [%sp+2223+0]
    107      1.1  mrg 	srlx	%i3, 16, %g3
    108      1.1  mrg 	and	%g3, xffff, %g3
    109      1.1  mrg 	stx	%g3, [%sp+2223+8]
    110      1.1  mrg 	srlx	%i3, 32, %g2
    111      1.1  mrg 	and	%g2, xffff, %g2
    112      1.1  mrg 	stx	%g2, [%sp+2223+16]
    113      1.1  mrg 	srlx	%i3, 48, %g3
    114      1.1  mrg 	stx	%g3, [%sp+2223+24]
    115      1.1  mrg 	srlx	%g4, 32, xffffffff	C store mask in register `xffffffff'
    116      1.1  mrg 
    117      1.1  mrg 	sllx	%i2, 3, %i2
    118      1.1  mrg 	mov	0, cy			C clear cy
    119      1.1  mrg 	add	%i0, %i2, %i0
    120      1.1  mrg 	add	%i1, %i2, %i1
    121      1.1  mrg 	neg	%i2
    122      1.1  mrg 	add	%i1, 4, %i5
    123      1.1  mrg 	add	%i0, -32, %i4
    124      1.1  mrg 	add	%i0, -16, %i0
    125      1.1  mrg 
    126      1.1  mrg 	ldd	[%sp+2223+0], v00
    127      1.1  mrg 	ldd	[%sp+2223+8], v16
    128      1.1  mrg 	ldd	[%sp+2223+16], v32
    129      1.1  mrg 	ldd	[%sp+2223+24], v48
    130      1.1  mrg 	ld	[%sp+2223+0],%f2	C zero f2
    131      1.1  mrg 	ld	[%sp+2223+0],%f4	C zero f4
    132      1.1  mrg 	ld	[%i5+%i2], %f3		C read low 32 bits of up[i]
    133      1.1  mrg 	ld	[%i1+%i2], %f5		C read high 32 bits of up[i]
    134      1.1  mrg 	fxtod	v00, v00
    135      1.1  mrg 	fxtod	v16, v16
    136      1.1  mrg 	fxtod	v32, v32
    137      1.1  mrg 	fxtod	v48, v48
    138      1.1  mrg 
    139      1.1  mrg C Start real work.  (We sneakingly read f3 and f5 above...)
    140      1.1  mrg C The software pipeline is very deep, requiring 4 feed-in stages.
    141      1.1  mrg 
    142      1.1  mrg 	fxtod	%f2, u00
    143      1.1  mrg 	fxtod	%f4, u32
    144      1.1  mrg 	fmuld	u00, v00, a00
    145      1.1  mrg 	fmuld	u00, v16, a16
    146      1.1  mrg 	fmuld	u00, v32, p32
    147      1.1  mrg 	fmuld	u32, v00, r32
    148      1.1  mrg 	fmuld	u00, v48, p48
    149      1.1  mrg 	addcc	%i2, 8, %i2
    150      1.1  mrg 	bnz,pt	%xcc, .L_two_or_more
    151      1.1  mrg 	fmuld	u32, v16, r48
    152      1.1  mrg 
    153      1.1  mrg .L_one:
    154      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    155      1.1  mrg 	faddd	p32, r32, a32
    156      1.1  mrg 	fdtox	a00, a00
    157      1.1  mrg 	faddd	p48, r48, a48
    158      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    159      1.1  mrg 	fdtox	a16, a16
    160      1.1  mrg 	fdtox	a32, a32
    161      1.1  mrg 	fdtox	a48, a48
    162      1.1  mrg 	std	a00, [%sp+2223+0]
    163      1.1  mrg 	std	a16, [%sp+2223+8]
    164      1.1  mrg 	std	a32, [%sp+2223+16]
    165      1.1  mrg 	std	a48, [%sp+2223+24]
    166      1.1  mrg 	add	%i2, 8, %i2
    167      1.1  mrg 
    168      1.1  mrg 	fdtox	r64, a00
    169      1.1  mrg 	fdtox	r80, a16
    170      1.1  mrg 	ldx	[%sp+2223+0], i00
    171      1.1  mrg 	ldx	[%sp+2223+8], i16
    172      1.1  mrg 	ldx	[%sp+2223+16], i32
    173      1.1  mrg 	ldx	[%sp+2223+24], i48
    174      1.1  mrg 	std	a00, [%sp+2223+0]
    175      1.1  mrg 	std	a16, [%sp+2223+8]
    176      1.1  mrg 	add	%i2, 8, %i2
    177      1.1  mrg 
    178      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    179      1.1  mrg 	ldx	[%sp+2223+0], i00
    180      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    181      1.1  mrg 	mov	i16, %g2
    182      1.1  mrg 	ldx	[%sp+2223+8], i16
    183      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    184      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    185      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    186      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    187      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    188      1.1  mrg 	std	a00, [%sp+2223+0]
    189      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    190      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    191      1.1  mrg 	std	a16, [%sp+2223+8]
    192      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    193      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    194      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    195      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    196      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    197      1.1  mrg 	b	.L_out_1
    198      1.1  mrg 	add	%i2, 8, %i2
    199      1.1  mrg 
    200      1.1  mrg .L_two_or_more:
    201      1.1  mrg 	ld	[%i5+%i2], %f3		C read low 32 bits of up[i]
    202      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    203      1.1  mrg 	faddd	p32, r32, a32
    204      1.1  mrg 	ld	[%i1+%i2], %f5		C read high 32 bits of up[i]
    205      1.1  mrg 	fdtox	a00, a00
    206      1.1  mrg 	faddd	p48, r48, a48
    207      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    208      1.1  mrg 	fdtox	a16, a16
    209      1.1  mrg 	fdtox	a32, a32
    210      1.1  mrg 	fxtod	%f2, u00
    211      1.1  mrg 	fxtod	%f4, u32
    212      1.1  mrg 	fdtox	a48, a48
    213      1.1  mrg 	std	a00, [%sp+2223+0]
    214      1.1  mrg 	fmuld	u00, v00, p00
    215      1.1  mrg 	std	a16, [%sp+2223+8]
    216      1.1  mrg 	fmuld	u00, v16, p16
    217      1.1  mrg 	std	a32, [%sp+2223+16]
    218      1.1  mrg 	fmuld	u00, v32, p32
    219      1.1  mrg 	std	a48, [%sp+2223+24]
    220      1.1  mrg 	faddd	p00, r64, a00
    221      1.1  mrg 	fmuld	u32, v00, r32
    222      1.1  mrg 	faddd	p16, r80, a16
    223      1.1  mrg 	fmuld	u00, v48, p48
    224      1.1  mrg 	addcc	%i2, 8, %i2
    225      1.1  mrg 	bnz,pt	%xcc, .L_three_or_more
    226      1.1  mrg 	fmuld	u32, v16, r48
    227      1.1  mrg 
    228      1.1  mrg .L_two:
    229      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    230      1.1  mrg 	faddd	p32, r32, a32
    231      1.1  mrg 	fdtox	a00, a00
    232      1.1  mrg 	faddd	p48, r48, a48
    233      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    234      1.1  mrg 	fdtox	a16, a16
    235      1.1  mrg 	ldx	[%sp+2223+0], i00
    236      1.1  mrg 	fdtox	a32, a32
    237      1.1  mrg 	ldx	[%sp+2223+8], i16
    238      1.1  mrg 	ldx	[%sp+2223+16], i32
    239      1.1  mrg 	ldx	[%sp+2223+24], i48
    240      1.1  mrg 	fdtox	a48, a48
    241      1.1  mrg 	std	a00, [%sp+2223+0]
    242      1.1  mrg 	std	a16, [%sp+2223+8]
    243      1.1  mrg 	std	a32, [%sp+2223+16]
    244      1.1  mrg 	std	a48, [%sp+2223+24]
    245      1.1  mrg 	add	%i2, 8, %i2
    246      1.1  mrg 
    247      1.1  mrg 	fdtox	r64, a00
    248      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    249      1.1  mrg 	fdtox	r80, a16
    250      1.1  mrg 	ldx	[%sp+2223+0], i00
    251      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    252      1.1  mrg 	mov	i16, %g2
    253      1.1  mrg 	ldx	[%sp+2223+8], i16
    254      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    255      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    256      1.1  mrg 	ldx	[%sp+2223+16], i32
    257      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    258      1.1  mrg 	ldx	[%sp+2223+24], i48
    259      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    260      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    261      1.1  mrg 	std	a00, [%sp+2223+0]
    262      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    263      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    264      1.1  mrg 	std	a16, [%sp+2223+8]
    265      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    266      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    267      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    268      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    269      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    270      1.1  mrg 	b	.L_out_2
    271      1.1  mrg 	add	%i2, 8, %i2
    272      1.1  mrg 
    273      1.1  mrg .L_three_or_more:
    274      1.1  mrg 	ld	[%i5+%i2], %f3		C read low 32 bits of up[i]
    275      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    276      1.1  mrg 	faddd	p32, r32, a32
    277      1.1  mrg 	ld	[%i1+%i2], %f5		C read high 32 bits of up[i]
    278      1.1  mrg 	fdtox	a00, a00
    279      1.1  mrg 	faddd	p48, r48, a48
    280      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    281      1.1  mrg 	fdtox	a16, a16
    282      1.1  mrg 	ldx	[%sp+2223+0], i00
    283      1.1  mrg 	fdtox	a32, a32
    284      1.1  mrg 	ldx	[%sp+2223+8], i16
    285      1.1  mrg 	fxtod	%f2, u00
    286      1.1  mrg 	ldx	[%sp+2223+16], i32
    287      1.1  mrg 	fxtod	%f4, u32
    288      1.1  mrg 	ldx	[%sp+2223+24], i48
    289      1.1  mrg 	fdtox	a48, a48
    290      1.1  mrg 	std	a00, [%sp+2223+0]
    291      1.1  mrg 	fmuld	u00, v00, p00
    292      1.1  mrg 	std	a16, [%sp+2223+8]
    293      1.1  mrg 	fmuld	u00, v16, p16
    294      1.1  mrg 	std	a32, [%sp+2223+16]
    295      1.1  mrg 	fmuld	u00, v32, p32
    296      1.1  mrg 	std	a48, [%sp+2223+24]
    297      1.1  mrg 	faddd	p00, r64, a00
    298      1.1  mrg 	fmuld	u32, v00, r32
    299      1.1  mrg 	faddd	p16, r80, a16
    300      1.1  mrg 	fmuld	u00, v48, p48
    301      1.1  mrg 	addcc	%i2, 8, %i2
    302      1.1  mrg 	bnz,pt	%xcc, .L_four_or_more
    303      1.1  mrg 	fmuld	u32, v16, r48
    304      1.1  mrg 
    305      1.1  mrg .L_three:
    306      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    307      1.1  mrg 	faddd	p32, r32, a32
    308      1.1  mrg 	fdtox	a00, a00
    309      1.1  mrg 	faddd	p48, r48, a48
    310      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    311      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    312      1.1  mrg 	fdtox	a16, a16
    313      1.1  mrg 	ldx	[%sp+2223+0], i00
    314      1.1  mrg 	fdtox	a32, a32
    315      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    316      1.1  mrg 	mov	i16, %g2
    317      1.1  mrg 	ldx	[%sp+2223+8], i16
    318      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    319      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    320      1.1  mrg 	ldx	[%sp+2223+16], i32
    321      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    322      1.1  mrg 	ldx	[%sp+2223+24], i48
    323      1.1  mrg 	fdtox	a48, a48
    324      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    325      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    326      1.1  mrg 	std	a00, [%sp+2223+0]
    327      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    328      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    329      1.1  mrg 	std	a16, [%sp+2223+8]
    330      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    331      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    332      1.1  mrg 	std	a32, [%sp+2223+16]
    333      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    334      1.1  mrg 	std	a48, [%sp+2223+24]
    335      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    336      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    337      1.1  mrg 	b	.L_out_3
    338      1.1  mrg 	add	%i2, 8, %i2
    339      1.1  mrg 
    340      1.1  mrg .L_four_or_more:
    341      1.1  mrg 	ld	[%i5+%i2], %f3		C read low 32 bits of up[i]
    342      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    343      1.1  mrg 	faddd	p32, r32, a32
    344      1.1  mrg 	ld	[%i1+%i2], %f5		C read high 32 bits of up[i]
    345      1.1  mrg 	fdtox	a00, a00
    346      1.1  mrg 	faddd	p48, r48, a48
    347      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    348      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    349      1.1  mrg 	fdtox	a16, a16
    350      1.1  mrg 	ldx	[%sp+2223+0], i00
    351      1.1  mrg 	fdtox	a32, a32
    352      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    353      1.1  mrg 	mov	i16, %g2
    354      1.1  mrg 	ldx	[%sp+2223+8], i16
    355      1.1  mrg 	fxtod	%f2, u00
    356      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    357      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    358      1.1  mrg 	ldx	[%sp+2223+16], i32
    359      1.1  mrg 	fxtod	%f4, u32
    360      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    361      1.1  mrg 	ldx	[%sp+2223+24], i48
    362      1.1  mrg 	fdtox	a48, a48
    363      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    364      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    365      1.1  mrg 	std	a00, [%sp+2223+0]
    366      1.1  mrg 	fmuld	u00, v00, p00
    367      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    368      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    369      1.1  mrg 	std	a16, [%sp+2223+8]
    370      1.1  mrg 	fmuld	u00, v16, p16
    371      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    372      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    373      1.1  mrg 	std	a32, [%sp+2223+16]
    374      1.1  mrg 	fmuld	u00, v32, p32
    375      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    376      1.1  mrg 	std	a48, [%sp+2223+24]
    377      1.1  mrg 	faddd	p00, r64, a00
    378      1.1  mrg 	fmuld	u32, v00, r32
    379      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    380      1.1  mrg 	faddd	p16, r80, a16
    381      1.1  mrg 	fmuld	u00, v48, p48
    382      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    383      1.1  mrg 	addcc	%i2, 8, %i2
    384      1.1  mrg 	bnz,pt	%xcc, .Loop
    385      1.1  mrg 	fmuld	u32, v16, r48
    386      1.1  mrg 
    387      1.1  mrg .L_four:
    388      1.1  mrg 	b,a	.L_out_4
    389      1.1  mrg 
    390      1.1  mrg C BEGIN MAIN LOOP
    391      1.1  mrg 	.align	16
    392      1.1  mrg .Loop:
    393      1.1  mrg C 00
    394      1.1  mrg 	srlx	%o4, 16, %o5		C (x >> 16)
    395      1.1  mrg 	ld	[%i5+%i2], %f3		C read low 32 bits of up[i]
    396      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    397      1.1  mrg 	faddd	p32, r32, a32
    398      1.1  mrg C 01
    399      1.1  mrg 	add	%o5, %o2, %o2		C mi64 in %o2   2nd ASSIGNMENT
    400      1.1  mrg 	and	%o4, xffff, %o5		C (x & 0xffff)
    401      1.1  mrg 	ld	[%i1+%i2], %f5		C read high 32 bits of up[i]
    402      1.1  mrg 	fdtox	a00, a00
    403      1.1  mrg C 02
    404      1.1  mrg 	faddd	p48, r48, a48
    405      1.1  mrg C 03
    406      1.1  mrg 	srlx	%o2, 48, %o7		C (mi64 >> 48)
    407      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    408      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    409      1.1  mrg 	fdtox	a16, a16
    410      1.1  mrg C 04
    411      1.1  mrg 	sllx	%o2, 16, %i3		C (mi64 << 16)
    412      1.1  mrg 	add	%o7, %o1, cy		C new cy
    413      1.1  mrg 	ldx	[%sp+2223+0], i00
    414      1.1  mrg 	fdtox	a32, a32
    415      1.1  mrg C 05
    416      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    417      1.1  mrg 	mov	i16, %g2
    418      1.1  mrg 	ldx	[%sp+2223+8], i16
    419      1.1  mrg 	fxtod	%f2, u00
    420      1.1  mrg C 06
    421      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    422      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    423      1.1  mrg 	ldx	[%sp+2223+16], i32
    424      1.1  mrg 	fxtod	%f4, u32
    425      1.1  mrg C 07
    426      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    427      1.1  mrg 	or	%i3, %o5, %o5
    428      1.1  mrg 	ldx	[%sp+2223+24], i48
    429      1.1  mrg 	fdtox	a48, a48
    430      1.1  mrg C 08
    431      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    432      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    433      1.1  mrg 	std	a00, [%sp+2223+0]
    434      1.1  mrg 	fmuld	u00, v00, p00
    435      1.1  mrg C 09
    436      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    437      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    438      1.1  mrg 	std	a16, [%sp+2223+8]
    439      1.1  mrg 	fmuld	u00, v16, p16
    440      1.1  mrg C 10
    441      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    442      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    443      1.1  mrg 	std	a32, [%sp+2223+16]
    444      1.1  mrg 	fmuld	u00, v32, p32
    445      1.1  mrg C 11
    446      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    447      1.1  mrg 	std	a48, [%sp+2223+24]
    448      1.1  mrg 	faddd	p00, r64, a00
    449      1.1  mrg 	fmuld	u32, v00, r32
    450      1.1  mrg C 12
    451      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    452      1.1  mrg 	stx	%o5, [%i4+%i2]
    453      1.1  mrg 	faddd	p16, r80, a16
    454      1.1  mrg 	fmuld	u00, v48, p48
    455      1.1  mrg C 13
    456      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    457      1.1  mrg 	addcc	%i2, 8, %i2
    458      1.1  mrg 	bnz,pt	%xcc, .Loop
    459      1.1  mrg 	fmuld	u32, v16, r48
    460      1.1  mrg C END MAIN LOOP
    461      1.1  mrg 
    462      1.1  mrg .L_out_4:
    463      1.1  mrg 	srlx	%o4, 16, %o5		C (x >> 16)
    464      1.1  mrg 	fmuld	u32, v32, r64	C FIXME not urgent
    465      1.1  mrg 	faddd	p32, r32, a32
    466      1.1  mrg 	add	%o5, %o2, %o2		C mi64 in %o2   2nd ASSIGNMENT
    467      1.1  mrg 	and	%o4, xffff, %o5		C (x & 0xffff)
    468      1.1  mrg 	fdtox	a00, a00
    469      1.1  mrg 	faddd	p48, r48, a48
    470      1.1  mrg 	srlx	%o2, 48, %o7		C (mi64 >> 48)
    471      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    472      1.1  mrg 	fmuld	u32, v48, r80	C FIXME not urgent
    473      1.1  mrg 	fdtox	a16, a16
    474      1.1  mrg 	sllx	%o2, 16, %i3		C (mi64 << 16)
    475      1.1  mrg 	add	%o7, %o1, cy		C new cy
    476      1.1  mrg 	ldx	[%sp+2223+0], i00
    477      1.1  mrg 	fdtox	a32, a32
    478      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    479      1.1  mrg 	mov	i16, %g2
    480      1.1  mrg 	ldx	[%sp+2223+8], i16
    481      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    482      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    483      1.1  mrg 	ldx	[%sp+2223+16], i32
    484      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    485      1.1  mrg 	or	%i3, %o5, %o5
    486      1.1  mrg 	ldx	[%sp+2223+24], i48
    487      1.1  mrg 	fdtox	a48, a48
    488      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    489      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    490      1.1  mrg 	std	a00, [%sp+2223+0]
    491      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    492      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    493      1.1  mrg 	std	a16, [%sp+2223+8]
    494      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    495      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    496      1.1  mrg 	std	a32, [%sp+2223+16]
    497      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    498      1.1  mrg 	std	a48, [%sp+2223+24]
    499      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    500      1.1  mrg 	stx	%o5, [%i4+%i2]
    501      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    502      1.1  mrg 	add	%i2, 8, %i2
    503      1.1  mrg .L_out_3:
    504      1.1  mrg 	srlx	%o4, 16, %o5		C (x >> 16)
    505      1.1  mrg 	add	%o5, %o2, %o2		C mi64 in %o2   2nd ASSIGNMENT
    506      1.1  mrg 	and	%o4, xffff, %o5		C (x & 0xffff)
    507      1.1  mrg 	fdtox	r64, a00
    508      1.1  mrg 	srlx	%o2, 48, %o7		C (mi64 >> 48)
    509      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    510      1.1  mrg 	fdtox	r80, a16
    511      1.1  mrg 	sllx	%o2, 16, %i3		C (mi64 << 16)
    512      1.1  mrg 	add	%o7, %o1, cy		C new cy
    513      1.1  mrg 	ldx	[%sp+2223+0], i00
    514      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    515      1.1  mrg 	mov	i16, %g2
    516      1.1  mrg 	ldx	[%sp+2223+8], i16
    517      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    518      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    519      1.1  mrg 	ldx	[%sp+2223+16], i32
    520      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    521      1.1  mrg 	or	%i3, %o5, %o5
    522      1.1  mrg 	ldx	[%sp+2223+24], i48
    523      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    524      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    525      1.1  mrg 	std	a00, [%sp+2223+0]
    526      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    527      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    528      1.1  mrg 	std	a16, [%sp+2223+8]
    529      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    530      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    531      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    532      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    533      1.1  mrg 	stx	%o5, [%i4+%i2]
    534      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    535      1.1  mrg 	add	%i2, 8, %i2
    536      1.1  mrg .L_out_2:
    537      1.1  mrg 	srlx	%o4, 16, %o5		C (x >> 16)
    538      1.1  mrg 	add	%o5, %o2, %o2		C mi64 in %o2   2nd ASSIGNMENT
    539      1.1  mrg 	and	%o4, xffff, %o5		C (x & 0xffff)
    540      1.1  mrg 	srlx	%o2, 48, %o7		C (mi64 >> 48)
    541      1.1  mrg 	mov	i00, %g5		C i00+ now in g5
    542      1.1  mrg 	sllx	%o2, 16, %i3		C (mi64 << 16)
    543      1.1  mrg 	add	%o7, %o1, cy		C new cy
    544      1.1  mrg 	ldx	[%sp+2223+0], i00
    545      1.1  mrg 	srlx	i16, 48, %l4		C (i16 >> 48)
    546      1.1  mrg 	mov	i16, %g2
    547      1.1  mrg 	ldx	[%sp+2223+8], i16
    548      1.1  mrg 	srlx	i48, 16, %l5		C (i48 >> 16)
    549      1.1  mrg 	mov	i32, %g4		C i32+ now in g4
    550      1.1  mrg 	sllx	i48, 32, %l6		C (i48 << 32)
    551      1.1  mrg 	or	%i3, %o5, %o5
    552      1.1  mrg 	srlx	%g4, 32, %o3		C (i32 >> 32)
    553      1.1  mrg 	add	%l5, %l4, %o1		C hi64- in %o1
    554      1.1  mrg 	sllx	%g4, 16, %o2		C (i32 << 16)
    555      1.1  mrg 	add	%o3, %o1, %o1		C hi64 in %o1   1st ASSIGNMENT
    556      1.1  mrg 	sllx	%o1, 48, %o3		C (hi64 << 48)
    557      1.1  mrg 	add	%g2, %o2, %o2		C mi64- in %o2
    558      1.1  mrg 	add	%l6, %o2, %o2		C mi64- in %o2
    559      1.1  mrg 	sub	%o2, %o3, %o2		C mi64 in %o2   1st ASSIGNMENT
    560      1.1  mrg 	stx	%o5, [%i4+%i2]
    561      1.1  mrg 	add	cy, %g5, %o4		C x = prev(i00) + cy
    562      1.1  mrg 	add	%i2, 8, %i2
    563      1.1  mrg .L_out_1:
    564      1.1  mrg 	srlx	%o4, 16, %o5		C (x >> 16)
    565      1.1  mrg 	add	%o5, %o2, %o2		C mi64 in %o2   2nd ASSIGNMENT
    566      1.1  mrg 	and	%o4, xffff, %o5		C (x & 0xffff)
    567      1.1  mrg 	srlx	%o2, 48, %o7		C (mi64 >> 48)
    568      1.1  mrg 	sllx	%o2, 16, %i3		C (mi64 << 16)
    569      1.1  mrg 	add	%o7, %o1, cy		C new cy
    570      1.1  mrg 	or	%i3, %o5, %o5
    571      1.1  mrg 	stx	%o5, [%i4+%i2]
    572      1.1  mrg 
    573      1.1  mrg 	sllx	i00, 0, %g2
    574      1.1  mrg 	add	%g2, cy, cy
    575      1.1  mrg 	sllx	i16, 16, %g3
    576      1.1  mrg 	add	%g3, cy, cy
    577      1.1  mrg 
    578      1.1  mrg 	return	%i7+8
    579      1.1  mrg 	mov	cy, %o0
    580      1.1  mrg EPILOGUE(mpn_mul_1)
    581