Home | History | Annotate | Line # | Download | only in pa32
      1 Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc.
      2 
      3 This file is part of the GNU MP Library.
      4 
      5 The GNU MP Library is free software; you can redistribute it and/or modify
      6 it under the terms of either:
      7 
      8   * the GNU Lesser General Public License as published by the Free
      9     Software Foundation; either version 3 of the License, or (at your
     10     option) any later version.
     11 
     12 or
     13 
     14   * the GNU General Public License as published by the Free Software
     15     Foundation; either version 2 of the License, or (at your option) any
     16     later version.
     17 
     18 or both in parallel, as here.
     19 
     20 The GNU MP Library is distributed in the hope that it will be useful, but
     21 WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
     22 or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
     23 for more details.
     24 
     25 You should have received copies of the GNU General Public License and the
     26 GNU Lesser General Public License along with the GNU MP Library.  If not,
     27 see https://www.gnu.org/licenses/.
     28 
     29 
     30 
     31 
     32 
     33 
     34 This directory contains mpn functions for various HP PA-RISC chips.  Code
     35 that runs faster on the PA7100 and later implementations, is in the pa7100
     36 directory.
     37 
     38 RELEVANT OPTIMIZATION ISSUES
     39 
     40   Load and Store timing
     41 
     42 On the PA7000 no memory instructions can issue the two cycles after a store.
     43 For the PA7100, this is reduced to one cycle.
     44 
     45 The PA7100 has a lookup-free cache, so it helps to schedule loads and the
     46 dependent instruction really far from each other.
     47 
     48 STATUS
     49 
     50 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
     51    instructions below (but some sw pipelining is needed to avoid the
     52    xmpyu-fstds delay):
     53 
     54 	fldds	s1_ptr
     55 
     56 	xmpyu
     57 	fstds	N(%r30)
     58 	xmpyu
     59 	fstds	N(%r30)
     60 
     61 	ldws	N(%r30)
     62 	ldws	N(%r30)
     63 	ldws	N(%r30)
     64 	ldws	N(%r30)
     65 
     66 	addc
     67 	stws	res_ptr
     68 	addc
     69 	stws	res_ptr
     70 
     71 	addib	Loop
     72 
     73 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
     74    (asymptotically) on the PA7100, using the instructions below.  With proper
     75    sw pipelining and the unrolling level below, the speed becomes 8
     76    cycles/limb.
     77 
     78 	fldds	s1_ptr
     79 	fldds	s1_ptr
     80 
     81 	xmpyu
     82 	fstds	N(%r30)
     83 	xmpyu
     84 	fstds	N(%r30)
     85 	xmpyu
     86 	fstds	N(%r30)
     87 	xmpyu
     88 	fstds	N(%r30)
     89 
     90 	ldws	N(%r30)
     91 	ldws	N(%r30)
     92 	ldws	N(%r30)
     93 	ldws	N(%r30)
     94 	ldws	N(%r30)
     95 	ldws	N(%r30)
     96 	ldws	N(%r30)
     97 	ldws	N(%r30)
     98 	addc
     99 	addc
    100 	addc
    101 	addc
    102 	addc	%r0,%r0,cy-limb
    103 
    104 	ldws	res_ptr
    105 	ldws	res_ptr
    106 	ldws	res_ptr
    107 	ldws	res_ptr
    108 	add
    109 	stws	res_ptr
    110 	addc
    111 	stws	res_ptr
    112 	addc
    113 	stws	res_ptr
    114 	addc
    115 	stws	res_ptr
    116 
    117 	addib
    118 
    119 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
    120    support emerges.  But we want to use 64-bit operations whenever possible,
    121    in particular for loads and stores.  It is possible to handle mpn_add_n
    122    efficiently by rotating (when s1/s2 are aligned), masking+bit field
    123    inserting when (they are not).  The speed should double compared to the
    124    code used today.
    125 
    126 
    127 
    128 
    129 LABEL SYNTAX
    130 
    131 The HP-UX assembler takes labels starting in column 0 with no colon,
    132 
    133 	L$loop  ldws,mb -4(0,%r25),%r22
    134 
    135 Gas on hppa GNU/Linux however requires a colon,
    136 
    137 	L$loop: ldws,mb -4(0,%r25),%r22
    138 
    139 This is covered by using LDEF() from asm-defs.m4.  An alternative would be
    140 to use ".label" which is accepted by both,
    141 
    142 		.label  L$loop
    143 		ldws,mb -4(0,%r25),%r22
    144 
    145 but that's not as nice to look at, not if you're used to assembler code
    146 having labels in column 0.
    147 
    148 
    149 
    150 
    151 REFERENCES
    152 
    153 Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998,
    154 part number 92432-90012.
    155 
    156 
    157 
    158 ----------------
    159 Local variables:
    160 mode: text
    161 fill-column: 76
    162 End:
    163