1 1.1 christos /* $NetBSD: umul.S,v 1.1 2005/12/20 19:28:50 christos Exp $ */ 2 1.1 christos 3 1.1 christos /* 4 1.1 christos * Copyright (c) 1992, 1993 5 1.1 christos * The Regents of the University of California. All rights reserved. 6 1.1 christos * 7 1.1 christos * This software was developed by the Computer Systems Engineering group 8 1.1 christos * at Lawrence Berkeley Laboratory under DARPA contract BG 91-66 and 9 1.1 christos * contributed to Berkeley. 10 1.1 christos * 11 1.1 christos * Redistribution and use in source and binary forms, with or without 12 1.1 christos * modification, are permitted provided that the following conditions 13 1.1 christos * are met: 14 1.1 christos * 1. Redistributions of source code must retain the above copyright 15 1.1 christos * notice, this list of conditions and the following disclaimer. 16 1.1 christos * 2. Redistributions in binary form must reproduce the above copyright 17 1.1 christos * notice, this list of conditions and the following disclaimer in the 18 1.1 christos * documentation and/or other materials provided with the distribution. 19 1.1 christos * 3. Neither the name of the University nor the names of its contributors 20 1.1 christos * may be used to endorse or promote products derived from this software 21 1.1 christos * without specific prior written permission. 22 1.1 christos * 23 1.1 christos * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 24 1.1 christos * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 25 1.1 christos * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 26 1.1 christos * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 27 1.1 christos * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 28 1.1 christos * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 29 1.1 christos * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 30 1.1 christos * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 31 1.1 christos * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 32 1.1 christos * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 33 1.1 christos * SUCH DAMAGE. 34 1.1 christos * 35 1.1 christos * from: Header: umul.s,v 1.4 92/06/25 13:24:05 torek Exp 36 1.1 christos */ 37 1.1 christos 38 1.1 christos #include <machine/asm.h> 39 1.1 christos #if defined(LIBC_SCCS) && !defined(lint) 40 1.1 christos #if 0 41 1.1 christos .asciz "@(#)umul.s 8.1 (Berkeley) 6/4/93" 42 1.1 christos #else 43 1.1 christos RCSID("$NetBSD: umul.S,v 1.1 2005/12/20 19:28:50 christos Exp $") 44 1.1 christos #endif 45 1.1 christos #endif /* LIBC_SCCS and not lint */ 46 1.1 christos 47 1.1 christos /* 48 1.1 christos * Unsigned multiply. Returns %o0 * %o1 in %o1%o0 (i.e., %o1 holds the 49 1.1 christos * upper 32 bits of the 64-bit product). 50 1.1 christos * 51 1.1 christos * This code optimizes short (less than 13-bit) multiplies. Short 52 1.1 christos * multiplies require 25 instruction cycles, and long ones require 53 1.1 christos * 45 instruction cycles. 54 1.1 christos * 55 1.1 christos * On return, overflow has occurred (%o1 is not zero) if and only if 56 1.1 christos * the Z condition code is clear, allowing, e.g., the following: 57 1.1 christos * 58 1.1 christos * call .umul 59 1.1 christos * nop 60 1.1 christos * bnz overflow (or tnz) 61 1.1 christos */ 62 1.1 christos 63 1.1 christos FUNC(.umul) 64 1.1 christos or %o0, %o1, %o4 65 1.1 christos mov %o0, %y ! multiplier -> Y 66 1.1 christos andncc %o4, 0xfff, %g0 ! test bits 12..31 of *both* args 67 1.1 christos be Lmul_shortway ! if zero, can do it the short way 68 1.1 christos andcc %g0, %g0, %o4 ! zero the partial product and clear N and V 69 1.1 christos 70 1.1 christos /* 71 1.1 christos * Long multiply. 32 steps, followed by a final shift step. 72 1.1 christos */ 73 1.1 christos mulscc %o4, %o1, %o4 ! 1 74 1.1 christos mulscc %o4, %o1, %o4 ! 2 75 1.1 christos mulscc %o4, %o1, %o4 ! 3 76 1.1 christos mulscc %o4, %o1, %o4 ! 4 77 1.1 christos mulscc %o4, %o1, %o4 ! 5 78 1.1 christos mulscc %o4, %o1, %o4 ! 6 79 1.1 christos mulscc %o4, %o1, %o4 ! 7 80 1.1 christos mulscc %o4, %o1, %o4 ! 8 81 1.1 christos mulscc %o4, %o1, %o4 ! 9 82 1.1 christos mulscc %o4, %o1, %o4 ! 10 83 1.1 christos mulscc %o4, %o1, %o4 ! 11 84 1.1 christos mulscc %o4, %o1, %o4 ! 12 85 1.1 christos mulscc %o4, %o1, %o4 ! 13 86 1.1 christos mulscc %o4, %o1, %o4 ! 14 87 1.1 christos mulscc %o4, %o1, %o4 ! 15 88 1.1 christos mulscc %o4, %o1, %o4 ! 16 89 1.1 christos mulscc %o4, %o1, %o4 ! 17 90 1.1 christos mulscc %o4, %o1, %o4 ! 18 91 1.1 christos mulscc %o4, %o1, %o4 ! 19 92 1.1 christos mulscc %o4, %o1, %o4 ! 20 93 1.1 christos mulscc %o4, %o1, %o4 ! 21 94 1.1 christos mulscc %o4, %o1, %o4 ! 22 95 1.1 christos mulscc %o4, %o1, %o4 ! 23 96 1.1 christos mulscc %o4, %o1, %o4 ! 24 97 1.1 christos mulscc %o4, %o1, %o4 ! 25 98 1.1 christos mulscc %o4, %o1, %o4 ! 26 99 1.1 christos mulscc %o4, %o1, %o4 ! 27 100 1.1 christos mulscc %o4, %o1, %o4 ! 28 101 1.1 christos mulscc %o4, %o1, %o4 ! 29 102 1.1 christos mulscc %o4, %o1, %o4 ! 30 103 1.1 christos mulscc %o4, %o1, %o4 ! 31 104 1.1 christos mulscc %o4, %o1, %o4 ! 32 105 1.1 christos mulscc %o4, %g0, %o4 ! final shift 106 1.1 christos 107 1.1 christos 108 1.1 christos /* 109 1.1 christos * Normally, with the shift-and-add approach, if both numbers are 110 1.1 christos * positive you get the correct result. WIth 32-bit two's-complement 111 1.1 christos * numbers, -x is represented as 112 1.1 christos * 113 1.1 christos * x 32 114 1.1 christos * ( 2 - ------ ) mod 2 * 2 115 1.1 christos * 32 116 1.1 christos * 2 117 1.1 christos * 118 1.1 christos * (the `mod 2' subtracts 1 from 1.bbbb). To avoid lots of 2^32s, 119 1.1 christos * we can treat this as if the radix point were just to the left 120 1.1 christos * of the sign bit (multiply by 2^32), and get 121 1.1 christos * 122 1.1 christos * -x = (2 - x) mod 2 123 1.1 christos * 124 1.1 christos * Then, ignoring the `mod 2's for convenience: 125 1.1 christos * 126 1.1 christos * x * y = xy 127 1.1 christos * -x * y = 2y - xy 128 1.1 christos * x * -y = 2x - xy 129 1.1 christos * -x * -y = 4 - 2x - 2y + xy 130 1.1 christos * 131 1.1 christos * For signed multiplies, we subtract (x << 32) from the partial 132 1.1 christos * product to fix this problem for negative multipliers (see mul.s). 133 1.1 christos * Because of the way the shift into the partial product is calculated 134 1.1 christos * (N xor V), this term is automatically removed for the multiplicand, 135 1.1 christos * so we don't have to adjust. 136 1.1 christos * 137 1.1 christos * But for unsigned multiplies, the high order bit wasn't a sign bit, 138 1.1 christos * and the correction is wrong. So for unsigned multiplies where the 139 1.1 christos * high order bit is one, we end up with xy - (y << 32). To fix it 140 1.1 christos * we add y << 32. 141 1.1 christos */ 142 1.1 christos tst %o1 143 1.1 christos bl,a 1f ! if %o1 < 0 (high order bit = 1), 144 1.1 christos add %o4, %o0, %o4 ! %o4 += %o0 (add y to upper half) 145 1.1 christos 1: rd %y, %o0 ! get lower half of product 146 1.1 christos retl 147 1.1 christos addcc %o4, %g0, %o1 ! put upper half in place and set Z for %o1==0 148 1.1 christos 149 1.1 christos Lmul_shortway: 150 1.1 christos /* 151 1.1 christos * Short multiply. 12 steps, followed by a final shift step. 152 1.1 christos * The resulting bits are off by 12 and (32-12) = 20 bit positions, 153 1.1 christos * but there is no problem with %o0 being negative (unlike above), 154 1.1 christos * and overflow is impossible (the answer is at most 24 bits long). 155 1.1 christos */ 156 1.1 christos mulscc %o4, %o1, %o4 ! 1 157 1.1 christos mulscc %o4, %o1, %o4 ! 2 158 1.1 christos mulscc %o4, %o1, %o4 ! 3 159 1.1 christos mulscc %o4, %o1, %o4 ! 4 160 1.1 christos mulscc %o4, %o1, %o4 ! 5 161 1.1 christos mulscc %o4, %o1, %o4 ! 6 162 1.1 christos mulscc %o4, %o1, %o4 ! 7 163 1.1 christos mulscc %o4, %o1, %o4 ! 8 164 1.1 christos mulscc %o4, %o1, %o4 ! 9 165 1.1 christos mulscc %o4, %o1, %o4 ! 10 166 1.1 christos mulscc %o4, %o1, %o4 ! 11 167 1.1 christos mulscc %o4, %o1, %o4 ! 12 168 1.1 christos mulscc %o4, %g0, %o4 ! final shift 169 1.1 christos 170 1.1 christos /* 171 1.1 christos * %o4 has 20 of the bits that should be in the result; %y has 172 1.1 christos * the bottom 12 (as %y's top 12). That is: 173 1.1 christos * 174 1.1 christos * %o4 %y 175 1.1 christos * +----------------+----------------+ 176 1.1 christos * | -12- | -20- | -12- | -20- | 177 1.1 christos * +------(---------+------)---------+ 178 1.1 christos * -----result----- 179 1.1 christos * 180 1.1 christos * The 12 bits of %o4 left of the `result' area are all zero; 181 1.1 christos * in fact, all top 20 bits of %o4 are zero. 182 1.1 christos */ 183 1.1 christos 184 1.1 christos rd %y, %o5 185 1.1 christos sll %o4, 12, %o0 ! shift middle bits left 12 186 1.1 christos srl %o5, 20, %o5 ! shift low bits right 20 187 1.1 christos or %o5, %o0, %o0 188 1.1 christos retl 189 1.1 christos addcc %g0, %g0, %o1 ! %o1 = zero, and set Z 190