Home | History | Annotate | Line # | Download | only in pa64
README revision 1.1.1.2
      1      1.1  mrg Copyright 1999, 2001, 2002, 2004 Free Software Foundation, Inc.
      2      1.1  mrg 
      3      1.1  mrg This file is part of the GNU MP Library.
      4      1.1  mrg 
      5      1.1  mrg The GNU MP Library is free software; you can redistribute it and/or modify
      6  1.1.1.2  mrg it under the terms of either:
      7  1.1.1.2  mrg 
      8  1.1.1.2  mrg   * the GNU Lesser General Public License as published by the Free
      9  1.1.1.2  mrg     Software Foundation; either version 3 of the License, or (at your
     10  1.1.1.2  mrg     option) any later version.
     11  1.1.1.2  mrg 
     12  1.1.1.2  mrg or
     13  1.1.1.2  mrg 
     14  1.1.1.2  mrg   * the GNU General Public License as published by the Free Software
     15  1.1.1.2  mrg     Foundation; either version 2 of the License, or (at your option) any
     16  1.1.1.2  mrg     later version.
     17  1.1.1.2  mrg 
     18  1.1.1.2  mrg or both in parallel, as here.
     19      1.1  mrg 
     20      1.1  mrg The GNU MP Library is distributed in the hope that it will be useful, but
     21      1.1  mrg WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
     22  1.1.1.2  mrg or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
     23  1.1.1.2  mrg for more details.
     24      1.1  mrg 
     25  1.1.1.2  mrg You should have received copies of the GNU General Public License and the
     26  1.1.1.2  mrg GNU Lesser General Public License along with the GNU MP Library.  If not,
     27  1.1.1.2  mrg see https://www.gnu.org/licenses/.
     28      1.1  mrg 
     29      1.1  mrg 
     30      1.1  mrg 
     31      1.1  mrg 
     32      1.1  mrg This directory contains mpn functions for 64-bit PA-RISC 2.0.
     33      1.1  mrg 
     34      1.1  mrg PIPELINE SUMMARY
     35      1.1  mrg 
     36      1.1  mrg The PA8x00 processors have an orthogonal 4-way out-of-order pipeline.  Each
     37      1.1  mrg cycle two ALU operations and two MEM operations can issue, but just one of the
     38      1.1  mrg MEM operations may be a store.  The two ALU operations can be almost any
     39      1.1  mrg combination of non-memory operations.  Unlike every other processor, integer
     40      1.1  mrg and fp operations are completely equal here; they both count as just ALU
     41      1.1  mrg operations.
     42      1.1  mrg 
     43      1.1  mrg Unfortunately, some operations cause hickups in the pipeline.  Combining
     44      1.1  mrg carry-consuming operations like ADD,DC with operations that does not set carry
     45      1.1  mrg like ADD,L cause long delays.  Skip operations also seem to cause hickups.  If
     46      1.1  mrg several ADD,DC are issued consecutively, or if plain carry-generating ADD feed
     47      1.1  mrg ADD,DC, stalling does not occur.  We can effectively issue two ADD,DC
     48      1.1  mrg operations/cycle.
     49      1.1  mrg 
     50      1.1  mrg Latency scheduling is not as important as making sure to have a mix of ALU and
     51      1.1  mrg MEM operations, but for full pipeline utilization, it is still a good idea to
     52      1.1  mrg do some amount of latency scheduling.
     53      1.1  mrg 
     54      1.1  mrg Like for all other processors, RAW memory scheduling is critically important.
     55      1.1  mrg Since integer multiplication takes place in the floating-point unit, the GMP
     56      1.1  mrg code needs to handle this problem frequently.
     57      1.1  mrg 
     58      1.1  mrg STATUS
     59      1.1  mrg 
     60      1.1  mrg * mpn_lshift and mpn_rshift run at 1.5 cycles/limb on PA8000 and at 1.0
     61      1.1  mrg   cycles/limb on PA8500.  With latency scheduling, the numbers could
     62      1.1  mrg   probably be improved to 1.0 cycles/limb for all PA8x00 chips.
     63      1.1  mrg 
     64      1.1  mrg * mpn_add_n and mpn_sub_n run at 2.0 cycles/limb on PA8000 and at about
     65      1.1  mrg   1.6875 cycles/limb on PA8500.  With latency scheduling, this could
     66      1.1  mrg   probably be improved to get close to 1.5 cycles/limb.  A problem is the
     67      1.1  mrg   stalling of carry-inputting instructions after instructions that do not
     68      1.1  mrg   write to carry.
     69      1.1  mrg 
     70      1.1  mrg * mpn_mul_1, mpn_addmul_1, and mpn_submul_1 run at between 5.625 and 6.375
     71      1.1  mrg   on PA8500 and later, and about a cycle/limb slower on older chips.  The
     72      1.1  mrg   code uses ADD,DC for adjacent limbs, and relies heavily on reordering.
     73      1.1  mrg 
     74      1.1  mrg 
     75      1.1  mrg REFERENCES
     76      1.1  mrg 
     77      1.1  mrg Hewlett Packard, "64-Bit Runtime Architecture for PA-RISC 2.0", version 3.3,
     78      1.1  mrg October 1997.
     79