1 1.1 christos /* 2 1.1 christos * inffast.S is a hand tuned assembler version of: 3 1.1 christos * 4 1.1 christos * inffast.c -- fast decoding 5 1.1 christos * Copyright (C) 1995-2003 Mark Adler 6 1.1 christos * For conditions of distribution and use, see copyright notice in zlib.h 7 1.1 christos * 8 1.1 christos * Copyright (C) 2003 Chris Anderson <christop (at) charm.net> 9 1.1 christos * Please use the copyright conditions above. 10 1.1 christos * 11 1.1 christos * This version (Jan-23-2003) of inflate_fast was coded and tested under 12 1.1 christos * GNU/Linux on a pentium 3, using the gcc-3.2 compiler distribution. On that 13 1.1 christos * machine, I found that gzip style archives decompressed about 20% faster than 14 1.1 christos * the gcc-3.2 -O3 -fomit-frame-pointer compiled version. Your results will 15 1.1 christos * depend on how large of a buffer is used for z_stream.next_in & next_out 16 1.1 christos * (8K-32K worked best for my 256K cpu cache) and how much overhead there is in 17 1.1 christos * stream processing I/O and crc32/addler32. In my case, this routine used 18 1.1 christos * 70% of the cpu time and crc32 used 20%. 19 1.1 christos * 20 1.1 christos * I am confident that this version will work in the general case, but I have 21 1.1 christos * not tested a wide variety of datasets or a wide variety of platforms. 22 1.1 christos * 23 1.1 christos * Jan-24-2003 -- Added -DUSE_MMX define for slightly faster inflating. 24 1.1 christos * It should be a runtime flag instead of compile time flag... 25 1.1 christos * 26 1.1 christos * Jan-26-2003 -- Added runtime check for MMX support with cpuid instruction. 27 1.1 christos * With -DUSE_MMX, only MMX code is compiled. With -DNO_MMX, only non-MMX code 28 1.1 christos * is compiled. Without either option, runtime detection is enabled. Runtime 29 1.1 christos * detection should work on all modern cpus and the recomended algorithm (flip 30 1.1 christos * ID bit on eflags and then use the cpuid instruction) is used in many 31 1.1 christos * multimedia applications. Tested under win2k with gcc-2.95 and gas-2.12 32 1.1 christos * distributed with cygwin3. Compiling with gcc-2.95 -c inffast.S -o 33 1.1 christos * inffast.obj generates a COFF object which can then be linked with MSVC++ 34 1.1 christos * compiled code. Tested under FreeBSD 4.7 with gcc-2.95. 35 1.1 christos * 36 1.1 christos * Jan-28-2003 -- Tested Athlon XP... MMX mode is slower than no MMX (and 37 1.1 christos * slower than compiler generated code). Adjusted cpuid check to use the MMX 38 1.1 christos * code only for Pentiums < P4 until I have more data on the P4. Speed 39 1.1 christos * improvment is only about 15% on the Athlon when compared with code generated 40 1.1 christos * with MSVC++. Not sure yet, but I think the P4 will also be slower using the 41 1.1 christos * MMX mode because many of it's x86 ALU instructions execute in .5 cycles and 42 1.1 christos * have less latency than MMX ops. Added code to buffer the last 11 bytes of 43 1.1 christos * the input stream since the MMX code grabs bits in chunks of 32, which 44 1.1 christos * differs from the inffast.c algorithm. I don't think there would have been 45 1.1 christos * read overruns where a page boundary was crossed (a segfault), but there 46 1.1 christos * could have been overruns when next_in ends on unaligned memory (unintialized 47 1.1 christos * memory read). 48 1.1 christos * 49 1.1 christos * Mar-13-2003 -- P4 MMX is slightly slower than P4 NO_MMX. I created a C 50 1.1 christos * version of the non-MMX code so that it doesn't depend on zstrm and zstate 51 1.1 christos * structure offsets which are hard coded in this file. This was last tested 52 1.1 christos * with zlib-1.2.0 which is currently in beta testing, newer versions of this 53 1.1 christos * and inffas86.c can be found at http://www.eetbeetee.com/zlib/ and 54 1.1 christos * http://www.charm.net/~christop/zlib/ 55 1.1 christos */ 56 1.1 christos 57 1.1 christos 58 1.1 christos /* 59 1.1 christos * if you have underscore linking problems (_inflate_fast undefined), try 60 1.1 christos * using -DGAS_COFF 61 1.1 christos */ 62 1.1 christos #if ! defined( GAS_COFF ) && ! defined( GAS_ELF ) 63 1.1 christos 64 1.1 christos #if defined( WIN32 ) || defined( __CYGWIN__ ) 65 1.1 christos #define GAS_COFF /* windows object format */ 66 1.1 christos #else 67 1.1 christos #define GAS_ELF 68 1.1 christos #endif 69 1.1 christos 70 1.1 christos #endif /* ! GAS_COFF && ! GAS_ELF */ 71 1.1 christos 72 1.1 christos 73 1.1 christos #if defined( GAS_COFF ) 74 1.1 christos 75 1.1 christos /* coff externals have underscores */ 76 1.1 christos #define inflate_fast _inflate_fast 77 1.1 christos #define inflate_fast_use_mmx _inflate_fast_use_mmx 78 1.1 christos 79 1.1 christos #endif /* GAS_COFF */ 80 1.1 christos 81 1.1 christos 82 1.1 christos .file "inffast.S" 83 1.1 christos 84 1.1 christos .globl inflate_fast 85 1.1 christos 86 1.1 christos .text 87 1.1 christos .align 4,0 88 1.1 christos .L_invalid_literal_length_code_msg: 89 1.1 christos .string "invalid literal/length code" 90 1.1 christos 91 1.1 christos .align 4,0 92 1.1 christos .L_invalid_distance_code_msg: 93 1.1 christos .string "invalid distance code" 94 1.1 christos 95 1.1 christos .align 4,0 96 1.1 christos .L_invalid_distance_too_far_msg: 97 1.1 christos .string "invalid distance too far back" 98 1.1 christos 99 1.1 christos #if ! defined( NO_MMX ) 100 1.1 christos .align 4,0 101 1.1 christos .L_mask: /* mask[N] = ( 1 << N ) - 1 */ 102 1.1 christos .long 0 103 1.1 christos .long 1 104 1.1 christos .long 3 105 1.1 christos .long 7 106 1.1 christos .long 15 107 1.1 christos .long 31 108 1.1 christos .long 63 109 1.1 christos .long 127 110 1.1 christos .long 255 111 1.1 christos .long 511 112 1.1 christos .long 1023 113 1.1 christos .long 2047 114 1.1 christos .long 4095 115 1.1 christos .long 8191 116 1.1 christos .long 16383 117 1.1 christos .long 32767 118 1.1 christos .long 65535 119 1.1 christos .long 131071 120 1.1 christos .long 262143 121 1.1 christos .long 524287 122 1.1 christos .long 1048575 123 1.1 christos .long 2097151 124 1.1 christos .long 4194303 125 1.1 christos .long 8388607 126 1.1 christos .long 16777215 127 1.1 christos .long 33554431 128 1.1 christos .long 67108863 129 1.1 christos .long 134217727 130 1.1 christos .long 268435455 131 1.1 christos .long 536870911 132 1.1 christos .long 1073741823 133 1.1 christos .long 2147483647 134 1.1 christos .long 4294967295 135 1.1 christos #endif /* NO_MMX */ 136 1.1 christos 137 1.1 christos .text 138 1.1 christos 139 1.1 christos /* 140 1.1 christos * struct z_stream offsets, in zlib.h 141 1.1 christos */ 142 1.1 christos #define next_in_strm 0 /* strm->next_in */ 143 1.1 christos #define avail_in_strm 4 /* strm->avail_in */ 144 1.1 christos #define next_out_strm 12 /* strm->next_out */ 145 1.1 christos #define avail_out_strm 16 /* strm->avail_out */ 146 1.1 christos #define msg_strm 24 /* strm->msg */ 147 1.1 christos #define state_strm 28 /* strm->state */ 148 1.1 christos 149 1.1 christos /* 150 1.1 christos * struct inflate_state offsets, in inflate.h 151 1.1 christos */ 152 1.1 christos #define mode_state 0 /* state->mode */ 153 1.1 christos #define wsize_state 32 /* state->wsize */ 154 1.1 christos #define write_state 40 /* state->write */ 155 1.1 christos #define window_state 44 /* state->window */ 156 1.1 christos #define hold_state 48 /* state->hold */ 157 1.1 christos #define bits_state 52 /* state->bits */ 158 1.1 christos #define lencode_state 68 /* state->lencode */ 159 1.1 christos #define distcode_state 72 /* state->distcode */ 160 1.1 christos #define lenbits_state 76 /* state->lenbits */ 161 1.1 christos #define distbits_state 80 /* state->distbits */ 162 1.1 christos 163 1.1 christos /* 164 1.1 christos * inflate_fast's activation record 165 1.1 christos */ 166 1.1 christos #define local_var_size 64 /* how much local space for vars */ 167 1.1 christos #define strm_sp 88 /* first arg: z_stream * (local_var_size + 24) */ 168 1.1 christos #define start_sp 92 /* second arg: unsigned int (local_var_size + 28) */ 169 1.1 christos 170 1.1 christos /* 171 1.1 christos * offsets for local vars on stack 172 1.1 christos */ 173 1.1 christos #define out 60 /* unsigned char* */ 174 1.1 christos #define window 56 /* unsigned char* */ 175 1.1 christos #define wsize 52 /* unsigned int */ 176 1.1 christos #define write 48 /* unsigned int */ 177 1.1 christos #define in 44 /* unsigned char* */ 178 1.1 christos #define beg 40 /* unsigned char* */ 179 1.1 christos #define buf 28 /* char[ 12 ] */ 180 1.1 christos #define len 24 /* unsigned int */ 181 1.1 christos #define last 20 /* unsigned char* */ 182 1.1 christos #define end 16 /* unsigned char* */ 183 1.1 christos #define dcode 12 /* code* */ 184 1.1 christos #define lcode 8 /* code* */ 185 1.1 christos #define dmask 4 /* unsigned int */ 186 1.1 christos #define lmask 0 /* unsigned int */ 187 1.1 christos 188 1.1 christos /* 189 1.1 christos * typedef enum inflate_mode consts, in inflate.h 190 1.1 christos */ 191 1.1 christos #define INFLATE_MODE_TYPE 11 /* state->mode flags enum-ed in inflate.h */ 192 1.1 christos #define INFLATE_MODE_BAD 26 193 1.1 christos 194 1.1 christos 195 1.1 christos #if ! defined( USE_MMX ) && ! defined( NO_MMX ) 196 1.1 christos 197 1.1 christos #define RUN_TIME_MMX 198 1.1 christos 199 1.1 christos #define CHECK_MMX 1 200 1.1 christos #define DO_USE_MMX 2 201 1.1 christos #define DONT_USE_MMX 3 202 1.1 christos 203 1.1 christos .globl inflate_fast_use_mmx 204 1.1 christos 205 1.1 christos .data 206 1.1 christos 207 1.1 christos .align 4,0 208 1.1 christos inflate_fast_use_mmx: /* integer flag for run time control 1=check,2=mmx,3=no */ 209 1.1 christos .long CHECK_MMX 210 1.1 christos 211 1.1 christos #if defined( GAS_ELF ) 212 1.1 christos /* elf info */ 213 1.1 christos .type inflate_fast_use_mmx,@object 214 1.1 christos .size inflate_fast_use_mmx,4 215 1.1 christos #endif 216 1.1 christos 217 1.1 christos #endif /* RUN_TIME_MMX */ 218 1.1 christos 219 1.1 christos #if defined( GAS_COFF ) 220 1.1 christos /* coff info: scl 2 = extern, type 32 = function */ 221 1.1 christos .def inflate_fast; .scl 2; .type 32; .endef 222 1.1 christos #endif 223 1.1 christos 224 1.1 christos .text 225 1.1 christos 226 1.1 christos .align 32,0x90 227 1.1 christos inflate_fast: 228 1.1 christos pushl %edi 229 1.1 christos pushl %esi 230 1.1 christos pushl %ebp 231 1.1 christos pushl %ebx 232 1.1 christos pushf /* save eflags (strm_sp, state_sp assumes this is 32 bits) */ 233 1.1 christos subl $local_var_size, %esp 234 1.1 christos cld 235 1.1 christos 236 1.1 christos #define strm_r %esi 237 1.1 christos #define state_r %edi 238 1.1 christos 239 1.1 christos movl strm_sp(%esp), strm_r 240 1.1 christos movl state_strm(strm_r), state_r 241 1.1 christos 242 1.1 christos /* in = strm->next_in; 243 1.1 christos * out = strm->next_out; 244 1.1 christos * last = in + strm->avail_in - 11; 245 1.1 christos * beg = out - (start - strm->avail_out); 246 1.1 christos * end = out + (strm->avail_out - 257); 247 1.1 christos */ 248 1.1 christos movl avail_in_strm(strm_r), %edx 249 1.1 christos movl next_in_strm(strm_r), %eax 250 1.1 christos 251 1.1 christos addl %eax, %edx /* avail_in += next_in */ 252 1.1 christos subl $11, %edx /* avail_in -= 11 */ 253 1.1 christos 254 1.1 christos movl %eax, in(%esp) 255 1.1 christos movl %edx, last(%esp) 256 1.1 christos 257 1.1 christos movl start_sp(%esp), %ebp 258 1.1 christos movl avail_out_strm(strm_r), %ecx 259 1.1 christos movl next_out_strm(strm_r), %ebx 260 1.1 christos 261 1.1 christos subl %ecx, %ebp /* start -= avail_out */ 262 1.1 christos negl %ebp /* start = -start */ 263 1.1 christos addl %ebx, %ebp /* start += next_out */ 264 1.1 christos 265 1.1 christos subl $257, %ecx /* avail_out -= 257 */ 266 1.1 christos addl %ebx, %ecx /* avail_out += out */ 267 1.1 christos 268 1.1 christos movl %ebx, out(%esp) 269 1.1 christos movl %ebp, beg(%esp) 270 1.1 christos movl %ecx, end(%esp) 271 1.1 christos 272 1.1 christos /* wsize = state->wsize; 273 1.1 christos * write = state->write; 274 1.1 christos * window = state->window; 275 1.1 christos * hold = state->hold; 276 1.1 christos * bits = state->bits; 277 1.1 christos * lcode = state->lencode; 278 1.1 christos * dcode = state->distcode; 279 1.1 christos * lmask = ( 1 << state->lenbits ) - 1; 280 1.1 christos * dmask = ( 1 << state->distbits ) - 1; 281 1.1 christos */ 282 1.1 christos 283 1.1 christos movl lencode_state(state_r), %eax 284 1.1 christos movl distcode_state(state_r), %ecx 285 1.1 christos 286 1.1 christos movl %eax, lcode(%esp) 287 1.1 christos movl %ecx, dcode(%esp) 288 1.1 christos 289 1.1 christos movl $1, %eax 290 1.1 christos movl lenbits_state(state_r), %ecx 291 1.1 christos shll %cl, %eax 292 1.1 christos decl %eax 293 1.1 christos movl %eax, lmask(%esp) 294 1.1 christos 295 1.1 christos movl $1, %eax 296 1.1 christos movl distbits_state(state_r), %ecx 297 1.1 christos shll %cl, %eax 298 1.1 christos decl %eax 299 1.1 christos movl %eax, dmask(%esp) 300 1.1 christos 301 1.1 christos movl wsize_state(state_r), %eax 302 1.1 christos movl write_state(state_r), %ecx 303 1.1 christos movl window_state(state_r), %edx 304 1.1 christos 305 1.1 christos movl %eax, wsize(%esp) 306 1.1 christos movl %ecx, write(%esp) 307 1.1 christos movl %edx, window(%esp) 308 1.1 christos 309 1.1 christos movl hold_state(state_r), %ebp 310 1.1 christos movl bits_state(state_r), %ebx 311 1.1 christos 312 1.1 christos #undef strm_r 313 1.1 christos #undef state_r 314 1.1 christos 315 1.1 christos #define in_r %esi 316 1.1 christos #define from_r %esi 317 1.1 christos #define out_r %edi 318 1.1 christos 319 1.1 christos movl in(%esp), in_r 320 1.1 christos movl last(%esp), %ecx 321 1.1 christos cmpl in_r, %ecx 322 1.1 christos ja .L_align_long /* if in < last */ 323 1.1 christos 324 1.1 christos addl $11, %ecx /* ecx = &in[ avail_in ] */ 325 1.1 christos subl in_r, %ecx /* ecx = avail_in */ 326 1.1 christos movl $12, %eax 327 1.1 christos subl %ecx, %eax /* eax = 12 - avail_in */ 328 1.1 christos leal buf(%esp), %edi 329 1.1 christos rep movsb /* memcpy( buf, in, avail_in ) */ 330 1.1 christos movl %eax, %ecx 331 1.1 christos xorl %eax, %eax 332 1.1 christos rep stosb /* memset( &buf[ avail_in ], 0, 12 - avail_in ) */ 333 1.1 christos leal buf(%esp), in_r /* in = buf */ 334 1.1 christos movl in_r, last(%esp) /* last = in, do just one iteration */ 335 1.1 christos jmp .L_is_aligned 336 1.1 christos 337 1.1 christos /* align in_r on long boundary */ 338 1.1 christos .L_align_long: 339 1.1 christos testl $3, in_r 340 1.1 christos jz .L_is_aligned 341 1.1 christos xorl %eax, %eax 342 1.1 christos movb (in_r), %al 343 1.1 christos incl in_r 344 1.1 christos movl %ebx, %ecx 345 1.1 christos addl $8, %ebx 346 1.1 christos shll %cl, %eax 347 1.1 christos orl %eax, %ebp 348 1.1 christos jmp .L_align_long 349 1.1 christos 350 1.1 christos .L_is_aligned: 351 1.1 christos movl out(%esp), out_r 352 1.1 christos 353 1.1 christos #if defined( NO_MMX ) 354 1.1 christos jmp .L_do_loop 355 1.1 christos #endif 356 1.1 christos 357 1.1 christos #if defined( USE_MMX ) 358 1.1 christos jmp .L_init_mmx 359 1.1 christos #endif 360 1.1 christos 361 1.1 christos /*** Runtime MMX check ***/ 362 1.1 christos 363 1.1 christos #if defined( RUN_TIME_MMX ) 364 1.1 christos .L_check_mmx: 365 1.1 christos cmpl $DO_USE_MMX, inflate_fast_use_mmx 366 1.1 christos je .L_init_mmx 367 1.1 christos ja .L_do_loop /* > 2 */ 368 1.1 christos 369 1.1 christos pushl %eax 370 1.1 christos pushl %ebx 371 1.1 christos pushl %ecx 372 1.1 christos pushl %edx 373 1.1 christos pushf 374 1.1 christos movl (%esp), %eax /* copy eflags to eax */ 375 1.1 christos xorl $0x200000, (%esp) /* try toggling ID bit of eflags (bit 21) 376 1.1 christos * to see if cpu supports cpuid... 377 1.1 christos * ID bit method not supported by NexGen but 378 1.1 christos * bios may load a cpuid instruction and 379 1.1 christos * cpuid may be disabled on Cyrix 5-6x86 */ 380 1.1 christos popf 381 1.1 christos pushf 382 1.1 christos popl %edx /* copy new eflags to edx */ 383 1.1 christos xorl %eax, %edx /* test if ID bit is flipped */ 384 1.1 christos jz .L_dont_use_mmx /* not flipped if zero */ 385 1.1 christos xorl %eax, %eax 386 1.1 christos cpuid 387 1.1 christos cmpl $0x756e6547, %ebx /* check for GenuineIntel in ebx,ecx,edx */ 388 1.1 christos jne .L_dont_use_mmx 389 1.1 christos cmpl $0x6c65746e, %ecx 390 1.1 christos jne .L_dont_use_mmx 391 1.1 christos cmpl $0x49656e69, %edx 392 1.1 christos jne .L_dont_use_mmx 393 1.1 christos movl $1, %eax 394 1.1 christos cpuid /* get cpu features */ 395 1.1 christos shrl $8, %eax 396 1.1 christos andl $15, %eax 397 1.1 christos cmpl $6, %eax /* check for Pentium family, is 0xf for P4 */ 398 1.1 christos jne .L_dont_use_mmx 399 1.1 christos testl $0x800000, %edx /* test if MMX feature is set (bit 23) */ 400 1.1 christos jnz .L_use_mmx 401 1.1 christos jmp .L_dont_use_mmx 402 1.1 christos .L_use_mmx: 403 1.1 christos movl $DO_USE_MMX, inflate_fast_use_mmx 404 1.1 christos jmp .L_check_mmx_pop 405 1.1 christos .L_dont_use_mmx: 406 1.1 christos movl $DONT_USE_MMX, inflate_fast_use_mmx 407 1.1 christos .L_check_mmx_pop: 408 1.1 christos popl %edx 409 1.1 christos popl %ecx 410 1.1 christos popl %ebx 411 1.1 christos popl %eax 412 1.1 christos jmp .L_check_mmx 413 1.1 christos #endif 414 1.1 christos 415 1.1 christos 416 1.1 christos /*** Non-MMX code ***/ 417 1.1 christos 418 1.1 christos #if defined ( NO_MMX ) || defined( RUN_TIME_MMX ) 419 1.1 christos 420 1.1 christos #define hold_r %ebp 421 1.1 christos #define bits_r %bl 422 1.1 christos #define bitslong_r %ebx 423 1.1 christos 424 1.1 christos .align 32,0x90 425 1.1 christos .L_while_test: 426 1.1 christos /* while (in < last && out < end) 427 1.1 christos */ 428 1.1 christos cmpl out_r, end(%esp) 429 1.1 christos jbe .L_break_loop /* if (out >= end) */ 430 1.1 christos 431 1.1 christos cmpl in_r, last(%esp) 432 1.1 christos jbe .L_break_loop 433 1.1 christos 434 1.1 christos .L_do_loop: 435 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out 436 1.1 christos * 437 1.1 christos * do { 438 1.1 christos * if (bits < 15) { 439 1.1 christos * hold |= *((unsigned short *)in)++ << bits; 440 1.1 christos * bits += 16 441 1.1 christos * } 442 1.1 christos * this = lcode[hold & lmask] 443 1.1 christos */ 444 1.1 christos cmpb $15, bits_r 445 1.1 christos ja .L_get_length_code /* if (15 < bits) */ 446 1.1 christos 447 1.1 christos xorl %eax, %eax 448 1.1 christos lodsw /* al = *(ushort *)in++ */ 449 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */ 450 1.1 christos addb $16, bits_r /* bits += 16 */ 451 1.1 christos shll %cl, %eax 452 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */ 453 1.1 christos 454 1.1 christos .L_get_length_code: 455 1.1 christos movl lmask(%esp), %edx /* edx = lmask */ 456 1.1 christos movl lcode(%esp), %ecx /* ecx = lcode */ 457 1.1 christos andl hold_r, %edx /* edx &= hold */ 458 1.1 christos movl (%ecx,%edx,4), %eax /* eax = lcode[hold & lmask] */ 459 1.1 christos 460 1.1 christos .L_dolen: 461 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out 462 1.1 christos * 463 1.1 christos * dolen: 464 1.1 christos * bits -= this.bits; 465 1.1 christos * hold >>= this.bits 466 1.1 christos */ 467 1.1 christos movb %ah, %cl /* cl = this.bits */ 468 1.1 christos subb %ah, bits_r /* bits -= this.bits */ 469 1.1 christos shrl %cl, hold_r /* hold >>= this.bits */ 470 1.1 christos 471 1.1 christos /* check if op is a literal 472 1.1 christos * if (op == 0) { 473 1.1 christos * PUP(out) = this.val; 474 1.1 christos * } 475 1.1 christos */ 476 1.1 christos testb %al, %al 477 1.1 christos jnz .L_test_for_length_base /* if (op != 0) 45.7% */ 478 1.1 christos 479 1.1 christos shrl $16, %eax /* output this.val char */ 480 1.1 christos stosb 481 1.1 christos jmp .L_while_test 482 1.1 christos 483 1.1 christos .L_test_for_length_base: 484 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out, %edx = len 485 1.1 christos * 486 1.1 christos * else if (op & 16) { 487 1.1 christos * len = this.val 488 1.1 christos * op &= 15 489 1.1 christos * if (op) { 490 1.1 christos * if (op > bits) { 491 1.1 christos * hold |= *((unsigned short *)in)++ << bits; 492 1.1 christos * bits += 16 493 1.1 christos * } 494 1.1 christos * len += hold & mask[op]; 495 1.1 christos * bits -= op; 496 1.1 christos * hold >>= op; 497 1.1 christos * } 498 1.1 christos */ 499 1.1 christos #define len_r %edx 500 1.1 christos movl %eax, len_r /* len = this */ 501 1.1 christos shrl $16, len_r /* len = this.val */ 502 1.1 christos movb %al, %cl 503 1.1 christos 504 1.1 christos testb $16, %al 505 1.1 christos jz .L_test_for_second_level_length /* if ((op & 16) == 0) 8% */ 506 1.1 christos andb $15, %cl /* op &= 15 */ 507 1.1 christos jz .L_save_len /* if (!op) */ 508 1.1 christos cmpb %cl, bits_r 509 1.1 christos jae .L_add_bits_to_len /* if (op <= bits) */ 510 1.1 christos 511 1.1 christos movb %cl, %ch /* stash op in ch, freeing cl */ 512 1.1 christos xorl %eax, %eax 513 1.1 christos lodsw /* al = *(ushort *)in++ */ 514 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */ 515 1.1 christos addb $16, bits_r /* bits += 16 */ 516 1.1 christos shll %cl, %eax 517 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */ 518 1.1 christos movb %ch, %cl /* move op back to ecx */ 519 1.1 christos 520 1.1 christos .L_add_bits_to_len: 521 1.1 christos movl $1, %eax 522 1.1 christos shll %cl, %eax 523 1.1 christos decl %eax 524 1.1 christos subb %cl, bits_r 525 1.1 christos andl hold_r, %eax /* eax &= hold */ 526 1.1 christos shrl %cl, hold_r 527 1.1 christos addl %eax, len_r /* len += hold & mask[op] */ 528 1.1 christos 529 1.1 christos .L_save_len: 530 1.1 christos movl len_r, len(%esp) /* save len */ 531 1.1 christos #undef len_r 532 1.1 christos 533 1.1 christos .L_decode_distance: 534 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out, %edx = dist 535 1.1 christos * 536 1.1 christos * if (bits < 15) { 537 1.1 christos * hold |= *((unsigned short *)in)++ << bits; 538 1.1 christos * bits += 16 539 1.1 christos * } 540 1.1 christos * this = dcode[hold & dmask]; 541 1.1 christos * dodist: 542 1.1 christos * bits -= this.bits; 543 1.1 christos * hold >>= this.bits; 544 1.1 christos * op = this.op; 545 1.1 christos */ 546 1.1 christos 547 1.1 christos cmpb $15, bits_r 548 1.1 christos ja .L_get_distance_code /* if (15 < bits) */ 549 1.1 christos 550 1.1 christos xorl %eax, %eax 551 1.1 christos lodsw /* al = *(ushort *)in++ */ 552 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */ 553 1.1 christos addb $16, bits_r /* bits += 16 */ 554 1.1 christos shll %cl, %eax 555 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */ 556 1.1 christos 557 1.1 christos .L_get_distance_code: 558 1.1 christos movl dmask(%esp), %edx /* edx = dmask */ 559 1.1 christos movl dcode(%esp), %ecx /* ecx = dcode */ 560 1.1 christos andl hold_r, %edx /* edx &= hold */ 561 1.1 christos movl (%ecx,%edx,4), %eax /* eax = dcode[hold & dmask] */ 562 1.1 christos 563 1.1 christos #define dist_r %edx 564 1.1 christos .L_dodist: 565 1.1 christos movl %eax, dist_r /* dist = this */ 566 1.1 christos shrl $16, dist_r /* dist = this.val */ 567 1.1 christos movb %ah, %cl 568 1.1 christos subb %ah, bits_r /* bits -= this.bits */ 569 1.1 christos shrl %cl, hold_r /* hold >>= this.bits */ 570 1.1 christos 571 1.1 christos /* if (op & 16) { 572 1.1 christos * dist = this.val 573 1.1 christos * op &= 15 574 1.1 christos * if (op > bits) { 575 1.1 christos * hold |= *((unsigned short *)in)++ << bits; 576 1.1 christos * bits += 16 577 1.1 christos * } 578 1.1 christos * dist += hold & mask[op]; 579 1.1 christos * bits -= op; 580 1.1 christos * hold >>= op; 581 1.1 christos */ 582 1.1 christos movb %al, %cl /* cl = this.op */ 583 1.1 christos 584 1.1 christos testb $16, %al /* if ((op & 16) == 0) */ 585 1.1 christos jz .L_test_for_second_level_dist 586 1.1 christos andb $15, %cl /* op &= 15 */ 587 1.1 christos jz .L_check_dist_one 588 1.1 christos cmpb %cl, bits_r 589 1.1 christos jae .L_add_bits_to_dist /* if (op <= bits) 97.6% */ 590 1.1 christos 591 1.1 christos movb %cl, %ch /* stash op in ch, freeing cl */ 592 1.1 christos xorl %eax, %eax 593 1.1 christos lodsw /* al = *(ushort *)in++ */ 594 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */ 595 1.1 christos addb $16, bits_r /* bits += 16 */ 596 1.1 christos shll %cl, %eax 597 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */ 598 1.1 christos movb %ch, %cl /* move op back to ecx */ 599 1.1 christos 600 1.1 christos .L_add_bits_to_dist: 601 1.1 christos movl $1, %eax 602 1.1 christos shll %cl, %eax 603 1.1 christos decl %eax /* (1 << op) - 1 */ 604 1.1 christos subb %cl, bits_r 605 1.1 christos andl hold_r, %eax /* eax &= hold */ 606 1.1 christos shrl %cl, hold_r 607 1.1 christos addl %eax, dist_r /* dist += hold & ((1 << op) - 1) */ 608 1.1 christos jmp .L_check_window 609 1.1 christos 610 1.1 christos .L_check_window: 611 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist 612 1.1 christos * %ecx = nbytes 613 1.1 christos * 614 1.1 christos * nbytes = out - beg; 615 1.1 christos * if (dist <= nbytes) { 616 1.1 christos * from = out - dist; 617 1.1 christos * do { 618 1.1 christos * PUP(out) = PUP(from); 619 1.1 christos * } while (--len > 0) { 620 1.1 christos * } 621 1.1 christos */ 622 1.1 christos 623 1.1 christos movl in_r, in(%esp) /* save in so from can use it's reg */ 624 1.1 christos movl out_r, %eax 625 1.1 christos subl beg(%esp), %eax /* nbytes = out - beg */ 626 1.1 christos 627 1.1 christos cmpl dist_r, %eax 628 1.1 christos jb .L_clip_window /* if (dist > nbytes) 4.2% */ 629 1.1 christos 630 1.1 christos movl len(%esp), %ecx 631 1.1 christos movl out_r, from_r 632 1.1 christos subl dist_r, from_r /* from = out - dist */ 633 1.1 christos 634 1.1 christos subl $3, %ecx 635 1.1 christos movb (from_r), %al 636 1.1 christos movb %al, (out_r) 637 1.1 christos movb 1(from_r), %al 638 1.1 christos movb 2(from_r), %dl 639 1.1 christos addl $3, from_r 640 1.1 christos movb %al, 1(out_r) 641 1.1 christos movb %dl, 2(out_r) 642 1.1 christos addl $3, out_r 643 1.1 christos rep movsb 644 1.1 christos 645 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */ 646 1.1 christos jmp .L_while_test 647 1.1 christos 648 1.1 christos .align 16,0x90 649 1.1 christos .L_check_dist_one: 650 1.1 christos cmpl $1, dist_r 651 1.1 christos jne .L_check_window 652 1.1 christos cmpl out_r, beg(%esp) 653 1.1 christos je .L_check_window 654 1.1 christos 655 1.1 christos decl out_r 656 1.1 christos movl len(%esp), %ecx 657 1.1 christos movb (out_r), %al 658 1.1 christos subl $3, %ecx 659 1.1 christos 660 1.1 christos movb %al, 1(out_r) 661 1.1 christos movb %al, 2(out_r) 662 1.1 christos movb %al, 3(out_r) 663 1.1 christos addl $4, out_r 664 1.1 christos rep stosb 665 1.1 christos 666 1.1 christos jmp .L_while_test 667 1.1 christos 668 1.1 christos .align 16,0x90 669 1.1 christos .L_test_for_second_level_length: 670 1.1 christos /* else if ((op & 64) == 0) { 671 1.1 christos * this = lcode[this.val + (hold & mask[op])]; 672 1.1 christos * } 673 1.1 christos */ 674 1.1 christos testb $64, %al 675 1.1 christos jnz .L_test_for_end_of_block /* if ((op & 64) != 0) */ 676 1.1 christos 677 1.1 christos movl $1, %eax 678 1.1 christos shll %cl, %eax 679 1.1 christos decl %eax 680 1.1 christos andl hold_r, %eax /* eax &= hold */ 681 1.1 christos addl %edx, %eax /* eax += this.val */ 682 1.1 christos movl lcode(%esp), %edx /* edx = lcode */ 683 1.1 christos movl (%edx,%eax,4), %eax /* eax = lcode[val + (hold&mask[op])] */ 684 1.1 christos jmp .L_dolen 685 1.1 christos 686 1.1 christos .align 16,0x90 687 1.1 christos .L_test_for_second_level_dist: 688 1.1 christos /* else if ((op & 64) == 0) { 689 1.1 christos * this = dcode[this.val + (hold & mask[op])]; 690 1.1 christos * } 691 1.1 christos */ 692 1.1 christos testb $64, %al 693 1.1 christos jnz .L_invalid_distance_code /* if ((op & 64) != 0) */ 694 1.1 christos 695 1.1 christos movl $1, %eax 696 1.1 christos shll %cl, %eax 697 1.1 christos decl %eax 698 1.1 christos andl hold_r, %eax /* eax &= hold */ 699 1.1 christos addl %edx, %eax /* eax += this.val */ 700 1.1 christos movl dcode(%esp), %edx /* edx = dcode */ 701 1.1 christos movl (%edx,%eax,4), %eax /* eax = dcode[val + (hold&mask[op])] */ 702 1.1 christos jmp .L_dodist 703 1.1 christos 704 1.1 christos .align 16,0x90 705 1.1 christos .L_clip_window: 706 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist 707 1.1 christos * %ecx = nbytes 708 1.1 christos * 709 1.1 christos * else { 710 1.1 christos * if (dist > wsize) { 711 1.1 christos * invalid distance 712 1.1 christos * } 713 1.1 christos * from = window; 714 1.1 christos * nbytes = dist - nbytes; 715 1.1 christos * if (write == 0) { 716 1.1 christos * from += wsize - nbytes; 717 1.1 christos */ 718 1.1 christos #define nbytes_r %ecx 719 1.1 christos movl %eax, nbytes_r 720 1.1 christos movl wsize(%esp), %eax /* prepare for dist compare */ 721 1.1 christos negl nbytes_r /* nbytes = -nbytes */ 722 1.1 christos movl window(%esp), from_r /* from = window */ 723 1.1 christos 724 1.1 christos cmpl dist_r, %eax 725 1.1 christos jb .L_invalid_distance_too_far /* if (dist > wsize) */ 726 1.1 christos 727 1.1 christos addl dist_r, nbytes_r /* nbytes = dist - nbytes */ 728 1.1 christos cmpl $0, write(%esp) 729 1.1 christos jne .L_wrap_around_window /* if (write != 0) */ 730 1.1 christos 731 1.1 christos subl nbytes_r, %eax 732 1.1 christos addl %eax, from_r /* from += wsize - nbytes */ 733 1.1 christos 734 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist 735 1.1 christos * %ecx = nbytes, %eax = len 736 1.1 christos * 737 1.1 christos * if (nbytes < len) { 738 1.1 christos * len -= nbytes; 739 1.1 christos * do { 740 1.1 christos * PUP(out) = PUP(from); 741 1.1 christos * } while (--nbytes); 742 1.1 christos * from = out - dist; 743 1.1 christos * } 744 1.1 christos * } 745 1.1 christos */ 746 1.1 christos #define len_r %eax 747 1.1 christos movl len(%esp), len_r 748 1.1 christos cmpl nbytes_r, len_r 749 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */ 750 1.1 christos 751 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 752 1.1 christos rep movsb 753 1.1 christos movl out_r, from_r 754 1.1 christos subl dist_r, from_r /* from = out - dist */ 755 1.1 christos jmp .L_do_copy1 756 1.1 christos 757 1.1 christos cmpl nbytes_r, len_r 758 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */ 759 1.1 christos 760 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 761 1.1 christos rep movsb 762 1.1 christos movl out_r, from_r 763 1.1 christos subl dist_r, from_r /* from = out - dist */ 764 1.1 christos jmp .L_do_copy1 765 1.1 christos 766 1.1 christos .L_wrap_around_window: 767 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist 768 1.1 christos * %ecx = nbytes, %eax = write, %eax = len 769 1.1 christos * 770 1.1 christos * else if (write < nbytes) { 771 1.1 christos * from += wsize + write - nbytes; 772 1.1 christos * nbytes -= write; 773 1.1 christos * if (nbytes < len) { 774 1.1 christos * len -= nbytes; 775 1.1 christos * do { 776 1.1 christos * PUP(out) = PUP(from); 777 1.1 christos * } while (--nbytes); 778 1.1 christos * from = window; 779 1.1 christos * nbytes = write; 780 1.1 christos * if (nbytes < len) { 781 1.1 christos * len -= nbytes; 782 1.1 christos * do { 783 1.1 christos * PUP(out) = PUP(from); 784 1.1 christos * } while(--nbytes); 785 1.1 christos * from = out - dist; 786 1.1 christos * } 787 1.1 christos * } 788 1.1 christos * } 789 1.1 christos */ 790 1.1 christos #define write_r %eax 791 1.1 christos movl write(%esp), write_r 792 1.1 christos cmpl write_r, nbytes_r 793 1.1 christos jbe .L_contiguous_in_window /* if (write >= nbytes) */ 794 1.1 christos 795 1.1 christos addl wsize(%esp), from_r 796 1.1 christos addl write_r, from_r 797 1.1 christos subl nbytes_r, from_r /* from += wsize + write - nbytes */ 798 1.1 christos subl write_r, nbytes_r /* nbytes -= write */ 799 1.1 christos #undef write_r 800 1.1 christos 801 1.1 christos movl len(%esp), len_r 802 1.1 christos cmpl nbytes_r, len_r 803 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */ 804 1.1 christos 805 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 806 1.1 christos rep movsb 807 1.1 christos movl window(%esp), from_r /* from = window */ 808 1.1 christos movl write(%esp), nbytes_r /* nbytes = write */ 809 1.1 christos cmpl nbytes_r, len_r 810 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */ 811 1.1 christos 812 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 813 1.1 christos rep movsb 814 1.1 christos movl out_r, from_r 815 1.1 christos subl dist_r, from_r /* from = out - dist */ 816 1.1 christos jmp .L_do_copy1 817 1.1 christos 818 1.1 christos .L_contiguous_in_window: 819 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist 820 1.1 christos * %ecx = nbytes, %eax = write, %eax = len 821 1.1 christos * 822 1.1 christos * else { 823 1.1 christos * from += write - nbytes; 824 1.1 christos * if (nbytes < len) { 825 1.1 christos * len -= nbytes; 826 1.1 christos * do { 827 1.1 christos * PUP(out) = PUP(from); 828 1.1 christos * } while (--nbytes); 829 1.1 christos * from = out - dist; 830 1.1 christos * } 831 1.1 christos * } 832 1.1 christos */ 833 1.1 christos #define write_r %eax 834 1.1 christos addl write_r, from_r 835 1.1 christos subl nbytes_r, from_r /* from += write - nbytes */ 836 1.1 christos #undef write_r 837 1.1 christos 838 1.1 christos movl len(%esp), len_r 839 1.1 christos cmpl nbytes_r, len_r 840 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */ 841 1.1 christos 842 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 843 1.1 christos rep movsb 844 1.1 christos movl out_r, from_r 845 1.1 christos subl dist_r, from_r /* from = out - dist */ 846 1.1 christos 847 1.1 christos .L_do_copy1: 848 1.1 christos /* regs: %esi = from, %esi = in, %ebp = hold, %bl = bits, %edi = out 849 1.1 christos * %eax = len 850 1.1 christos * 851 1.1 christos * while (len > 0) { 852 1.1 christos * PUP(out) = PUP(from); 853 1.1 christos * len--; 854 1.1 christos * } 855 1.1 christos * } 856 1.1 christos * } while (in < last && out < end); 857 1.1 christos */ 858 1.1 christos #undef nbytes_r 859 1.1 christos #define in_r %esi 860 1.1 christos movl len_r, %ecx 861 1.1 christos rep movsb 862 1.1 christos 863 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */ 864 1.1 christos jmp .L_while_test 865 1.1 christos 866 1.1 christos #undef len_r 867 1.1 christos #undef dist_r 868 1.1 christos 869 1.1 christos #endif /* NO_MMX || RUN_TIME_MMX */ 870 1.1 christos 871 1.1 christos 872 1.1 christos /*** MMX code ***/ 873 1.1 christos 874 1.1 christos #if defined( USE_MMX ) || defined( RUN_TIME_MMX ) 875 1.1 christos 876 1.1 christos .align 32,0x90 877 1.1 christos .L_init_mmx: 878 1.1 christos emms 879 1.1 christos 880 1.1 christos #undef bits_r 881 1.1 christos #undef bitslong_r 882 1.1 christos #define bitslong_r %ebp 883 1.1 christos #define hold_mm %mm0 884 1.1 christos movd %ebp, hold_mm 885 1.1 christos movl %ebx, bitslong_r 886 1.1 christos 887 1.1 christos #define used_mm %mm1 888 1.1 christos #define dmask2_mm %mm2 889 1.1 christos #define lmask2_mm %mm3 890 1.1 christos #define lmask_mm %mm4 891 1.1 christos #define dmask_mm %mm5 892 1.1 christos #define tmp_mm %mm6 893 1.1 christos 894 1.1 christos movd lmask(%esp), lmask_mm 895 1.1 christos movq lmask_mm, lmask2_mm 896 1.1 christos movd dmask(%esp), dmask_mm 897 1.1 christos movq dmask_mm, dmask2_mm 898 1.1 christos pxor used_mm, used_mm 899 1.1 christos movl lcode(%esp), %ebx /* ebx = lcode */ 900 1.1 christos jmp .L_do_loop_mmx 901 1.1 christos 902 1.1 christos .align 32,0x90 903 1.1 christos .L_while_test_mmx: 904 1.1 christos /* while (in < last && out < end) 905 1.1 christos */ 906 1.1 christos cmpl out_r, end(%esp) 907 1.1 christos jbe .L_break_loop /* if (out >= end) */ 908 1.1 christos 909 1.1 christos cmpl in_r, last(%esp) 910 1.1 christos jbe .L_break_loop 911 1.1 christos 912 1.1 christos .L_do_loop_mmx: 913 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */ 914 1.1 christos 915 1.1 christos cmpl $32, bitslong_r 916 1.1 christos ja .L_get_length_code_mmx /* if (32 < bits) */ 917 1.1 christos 918 1.1 christos movd bitslong_r, tmp_mm 919 1.1 christos movd (in_r), %mm7 920 1.1 christos addl $4, in_r 921 1.1 christos psllq tmp_mm, %mm7 922 1.1 christos addl $32, bitslong_r 923 1.1 christos por %mm7, hold_mm /* hold_mm |= *((uint *)in)++ << bits */ 924 1.1 christos 925 1.1 christos .L_get_length_code_mmx: 926 1.1 christos pand hold_mm, lmask_mm 927 1.1 christos movd lmask_mm, %eax 928 1.1 christos movq lmask2_mm, lmask_mm 929 1.1 christos movl (%ebx,%eax,4), %eax /* eax = lcode[hold & lmask] */ 930 1.1 christos 931 1.1 christos .L_dolen_mmx: 932 1.1 christos movzbl %ah, %ecx /* ecx = this.bits */ 933 1.1 christos movd %ecx, used_mm 934 1.1 christos subl %ecx, bitslong_r /* bits -= this.bits */ 935 1.1 christos 936 1.1 christos testb %al, %al 937 1.1 christos jnz .L_test_for_length_base_mmx /* if (op != 0) 45.7% */ 938 1.1 christos 939 1.1 christos shrl $16, %eax /* output this.val char */ 940 1.1 christos stosb 941 1.1 christos jmp .L_while_test_mmx 942 1.1 christos 943 1.1 christos .L_test_for_length_base_mmx: 944 1.1 christos #define len_r %edx 945 1.1 christos movl %eax, len_r /* len = this */ 946 1.1 christos shrl $16, len_r /* len = this.val */ 947 1.1 christos 948 1.1 christos testb $16, %al 949 1.1 christos jz .L_test_for_second_level_length_mmx /* if ((op & 16) == 0) 8% */ 950 1.1 christos andl $15, %eax /* op &= 15 */ 951 1.1 christos jz .L_decode_distance_mmx /* if (!op) */ 952 1.1 christos 953 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */ 954 1.1 christos movd %eax, used_mm 955 1.1 christos movd hold_mm, %ecx 956 1.1 christos subl %eax, bitslong_r 957 1.1 christos andl .L_mask(,%eax,4), %ecx 958 1.1 christos addl %ecx, len_r /* len += hold & mask[op] */ 959 1.1 christos 960 1.1 christos .L_decode_distance_mmx: 961 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */ 962 1.1 christos 963 1.1 christos cmpl $32, bitslong_r 964 1.1 christos ja .L_get_dist_code_mmx /* if (32 < bits) */ 965 1.1 christos 966 1.1 christos movd bitslong_r, tmp_mm 967 1.1 christos movd (in_r), %mm7 968 1.1 christos addl $4, in_r 969 1.1 christos psllq tmp_mm, %mm7 970 1.1 christos addl $32, bitslong_r 971 1.1 christos por %mm7, hold_mm /* hold_mm |= *((uint *)in)++ << bits */ 972 1.1 christos 973 1.1 christos .L_get_dist_code_mmx: 974 1.1 christos movl dcode(%esp), %ebx /* ebx = dcode */ 975 1.1 christos pand hold_mm, dmask_mm 976 1.1 christos movd dmask_mm, %eax 977 1.1 christos movq dmask2_mm, dmask_mm 978 1.1 christos movl (%ebx,%eax,4), %eax /* eax = dcode[hold & lmask] */ 979 1.1 christos 980 1.1 christos .L_dodist_mmx: 981 1.1 christos #define dist_r %ebx 982 1.1 christos movzbl %ah, %ecx /* ecx = this.bits */ 983 1.1 christos movl %eax, dist_r 984 1.1 christos shrl $16, dist_r /* dist = this.val */ 985 1.1 christos subl %ecx, bitslong_r /* bits -= this.bits */ 986 1.1 christos movd %ecx, used_mm 987 1.1 christos 988 1.1 christos testb $16, %al /* if ((op & 16) == 0) */ 989 1.1 christos jz .L_test_for_second_level_dist_mmx 990 1.1 christos andl $15, %eax /* op &= 15 */ 991 1.1 christos jz .L_check_dist_one_mmx 992 1.1 christos 993 1.1 christos .L_add_bits_to_dist_mmx: 994 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */ 995 1.1 christos movd %eax, used_mm /* save bit length of current op */ 996 1.1 christos movd hold_mm, %ecx /* get the next bits on input stream */ 997 1.1 christos subl %eax, bitslong_r /* bits -= op bits */ 998 1.1 christos andl .L_mask(,%eax,4), %ecx /* ecx = hold & mask[op] */ 999 1.1 christos addl %ecx, dist_r /* dist += hold & mask[op] */ 1000 1.1 christos 1001 1.1 christos .L_check_window_mmx: 1002 1.1 christos movl in_r, in(%esp) /* save in so from can use it's reg */ 1003 1.1 christos movl out_r, %eax 1004 1.1 christos subl beg(%esp), %eax /* nbytes = out - beg */ 1005 1.1 christos 1006 1.1 christos cmpl dist_r, %eax 1007 1.1 christos jb .L_clip_window_mmx /* if (dist > nbytes) 4.2% */ 1008 1.1 christos 1009 1.1 christos movl len_r, %ecx 1010 1.1 christos movl out_r, from_r 1011 1.1 christos subl dist_r, from_r /* from = out - dist */ 1012 1.1 christos 1013 1.1 christos subl $3, %ecx 1014 1.1 christos movb (from_r), %al 1015 1.1 christos movb %al, (out_r) 1016 1.1 christos movb 1(from_r), %al 1017 1.1 christos movb 2(from_r), %dl 1018 1.1 christos addl $3, from_r 1019 1.1 christos movb %al, 1(out_r) 1020 1.1 christos movb %dl, 2(out_r) 1021 1.1 christos addl $3, out_r 1022 1.1 christos rep movsb 1023 1.1 christos 1024 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */ 1025 1.1 christos movl lcode(%esp), %ebx /* move lcode back to %ebx, toss dist */ 1026 1.1 christos jmp .L_while_test_mmx 1027 1.1 christos 1028 1.1 christos .align 16,0x90 1029 1.1 christos .L_check_dist_one_mmx: 1030 1.1 christos cmpl $1, dist_r 1031 1.1 christos jne .L_check_window_mmx 1032 1.1 christos cmpl out_r, beg(%esp) 1033 1.1 christos je .L_check_window_mmx 1034 1.1 christos 1035 1.1 christos decl out_r 1036 1.1 christos movl len_r, %ecx 1037 1.1 christos movb (out_r), %al 1038 1.1 christos subl $3, %ecx 1039 1.1 christos 1040 1.1 christos movb %al, 1(out_r) 1041 1.1 christos movb %al, 2(out_r) 1042 1.1 christos movb %al, 3(out_r) 1043 1.1 christos addl $4, out_r 1044 1.1 christos rep stosb 1045 1.1 christos 1046 1.1 christos movl lcode(%esp), %ebx /* move lcode back to %ebx, toss dist */ 1047 1.1 christos jmp .L_while_test_mmx 1048 1.1 christos 1049 1.1 christos .align 16,0x90 1050 1.1 christos .L_test_for_second_level_length_mmx: 1051 1.1 christos testb $64, %al 1052 1.1 christos jnz .L_test_for_end_of_block /* if ((op & 64) != 0) */ 1053 1.1 christos 1054 1.1 christos andl $15, %eax 1055 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */ 1056 1.1 christos movd hold_mm, %ecx 1057 1.1 christos andl .L_mask(,%eax,4), %ecx 1058 1.1 christos addl len_r, %ecx 1059 1.1 christos movl (%ebx,%ecx,4), %eax /* eax = lcode[hold & lmask] */ 1060 1.1 christos jmp .L_dolen_mmx 1061 1.1 christos 1062 1.1 christos .align 16,0x90 1063 1.1 christos .L_test_for_second_level_dist_mmx: 1064 1.1 christos testb $64, %al 1065 1.1 christos jnz .L_invalid_distance_code /* if ((op & 64) != 0) */ 1066 1.1 christos 1067 1.1 christos andl $15, %eax 1068 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */ 1069 1.1 christos movd hold_mm, %ecx 1070 1.1 christos andl .L_mask(,%eax,4), %ecx 1071 1.1 christos movl dcode(%esp), %eax /* ecx = dcode */ 1072 1.1 christos addl dist_r, %ecx 1073 1.1 christos movl (%eax,%ecx,4), %eax /* eax = lcode[hold & lmask] */ 1074 1.1 christos jmp .L_dodist_mmx 1075 1.1 christos 1076 1.1 christos .align 16,0x90 1077 1.1 christos .L_clip_window_mmx: 1078 1.1 christos #define nbytes_r %ecx 1079 1.1 christos movl %eax, nbytes_r 1080 1.1 christos movl wsize(%esp), %eax /* prepare for dist compare */ 1081 1.1 christos negl nbytes_r /* nbytes = -nbytes */ 1082 1.1 christos movl window(%esp), from_r /* from = window */ 1083 1.1 christos 1084 1.1 christos cmpl dist_r, %eax 1085 1.1 christos jb .L_invalid_distance_too_far /* if (dist > wsize) */ 1086 1.1 christos 1087 1.1 christos addl dist_r, nbytes_r /* nbytes = dist - nbytes */ 1088 1.1 christos cmpl $0, write(%esp) 1089 1.1 christos jne .L_wrap_around_window_mmx /* if (write != 0) */ 1090 1.1 christos 1091 1.1 christos subl nbytes_r, %eax 1092 1.1 christos addl %eax, from_r /* from += wsize - nbytes */ 1093 1.1 christos 1094 1.1 christos cmpl nbytes_r, len_r 1095 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */ 1096 1.1 christos 1097 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 1098 1.1 christos rep movsb 1099 1.1 christos movl out_r, from_r 1100 1.1 christos subl dist_r, from_r /* from = out - dist */ 1101 1.1 christos jmp .L_do_copy1_mmx 1102 1.1 christos 1103 1.1 christos cmpl nbytes_r, len_r 1104 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */ 1105 1.1 christos 1106 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 1107 1.1 christos rep movsb 1108 1.1 christos movl out_r, from_r 1109 1.1 christos subl dist_r, from_r /* from = out - dist */ 1110 1.1 christos jmp .L_do_copy1_mmx 1111 1.1 christos 1112 1.1 christos .L_wrap_around_window_mmx: 1113 1.1 christos #define write_r %eax 1114 1.1 christos movl write(%esp), write_r 1115 1.1 christos cmpl write_r, nbytes_r 1116 1.1 christos jbe .L_contiguous_in_window_mmx /* if (write >= nbytes) */ 1117 1.1 christos 1118 1.1 christos addl wsize(%esp), from_r 1119 1.1 christos addl write_r, from_r 1120 1.1 christos subl nbytes_r, from_r /* from += wsize + write - nbytes */ 1121 1.1 christos subl write_r, nbytes_r /* nbytes -= write */ 1122 1.1 christos #undef write_r 1123 1.1 christos 1124 1.1 christos cmpl nbytes_r, len_r 1125 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */ 1126 1.1 christos 1127 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 1128 1.1 christos rep movsb 1129 1.1 christos movl window(%esp), from_r /* from = window */ 1130 1.1 christos movl write(%esp), nbytes_r /* nbytes = write */ 1131 1.1 christos cmpl nbytes_r, len_r 1132 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */ 1133 1.1 christos 1134 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 1135 1.1 christos rep movsb 1136 1.1 christos movl out_r, from_r 1137 1.1 christos subl dist_r, from_r /* from = out - dist */ 1138 1.1 christos jmp .L_do_copy1_mmx 1139 1.1 christos 1140 1.1 christos .L_contiguous_in_window_mmx: 1141 1.1 christos #define write_r %eax 1142 1.1 christos addl write_r, from_r 1143 1.1 christos subl nbytes_r, from_r /* from += write - nbytes */ 1144 1.1 christos #undef write_r 1145 1.1 christos 1146 1.1 christos cmpl nbytes_r, len_r 1147 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */ 1148 1.1 christos 1149 1.1 christos subl nbytes_r, len_r /* len -= nbytes */ 1150 1.1 christos rep movsb 1151 1.1 christos movl out_r, from_r 1152 1.1 christos subl dist_r, from_r /* from = out - dist */ 1153 1.1 christos 1154 1.1 christos .L_do_copy1_mmx: 1155 1.1 christos #undef nbytes_r 1156 1.1 christos #define in_r %esi 1157 1.1 christos movl len_r, %ecx 1158 1.1 christos rep movsb 1159 1.1 christos 1160 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */ 1161 1.1 christos movl lcode(%esp), %ebx /* move lcode back to %ebx, toss dist */ 1162 1.1 christos jmp .L_while_test_mmx 1163 1.1 christos 1164 1.1 christos #undef hold_r 1165 1.1 christos #undef bitslong_r 1166 1.1 christos 1167 1.1 christos #endif /* USE_MMX || RUN_TIME_MMX */ 1168 1.1 christos 1169 1.1 christos 1170 1.1 christos /*** USE_MMX, NO_MMX, and RUNTIME_MMX from here on ***/ 1171 1.1 christos 1172 1.1 christos .L_invalid_distance_code: 1173 1.1 christos /* else { 1174 1.1 christos * strm->msg = "invalid distance code"; 1175 1.1 christos * state->mode = BAD; 1176 1.1 christos * } 1177 1.1 christos */ 1178 1.1 christos movl $.L_invalid_distance_code_msg, %ecx 1179 1.1 christos movl $INFLATE_MODE_BAD, %edx 1180 1.1 christos jmp .L_update_stream_state 1181 1.1 christos 1182 1.1 christos .L_test_for_end_of_block: 1183 1.1 christos /* else if (op & 32) { 1184 1.1 christos * state->mode = TYPE; 1185 1.1 christos * break; 1186 1.1 christos * } 1187 1.1 christos */ 1188 1.1 christos testb $32, %al 1189 1.1 christos jz .L_invalid_literal_length_code /* if ((op & 32) == 0) */ 1190 1.1 christos 1191 1.1 christos movl $0, %ecx 1192 1.1 christos movl $INFLATE_MODE_TYPE, %edx 1193 1.1 christos jmp .L_update_stream_state 1194 1.1 christos 1195 1.1 christos .L_invalid_literal_length_code: 1196 1.1 christos /* else { 1197 1.1 christos * strm->msg = "invalid literal/length code"; 1198 1.1 christos * state->mode = BAD; 1199 1.1 christos * } 1200 1.1 christos */ 1201 1.1 christos movl $.L_invalid_literal_length_code_msg, %ecx 1202 1.1 christos movl $INFLATE_MODE_BAD, %edx 1203 1.1 christos jmp .L_update_stream_state 1204 1.1 christos 1205 1.1 christos .L_invalid_distance_too_far: 1206 1.1 christos /* strm->msg = "invalid distance too far back"; 1207 1.1 christos * state->mode = BAD; 1208 1.1 christos */ 1209 1.1 christos movl in(%esp), in_r /* from_r has in's reg, put in back */ 1210 1.1 christos movl $.L_invalid_distance_too_far_msg, %ecx 1211 1.1 christos movl $INFLATE_MODE_BAD, %edx 1212 1.1 christos jmp .L_update_stream_state 1213 1.1 christos 1214 1.1 christos .L_update_stream_state: 1215 1.1 christos /* set strm->msg = %ecx, strm->state->mode = %edx */ 1216 1.1 christos movl strm_sp(%esp), %eax 1217 1.1 christos testl %ecx, %ecx /* if (msg != NULL) */ 1218 1.1 christos jz .L_skip_msg 1219 1.1 christos movl %ecx, msg_strm(%eax) /* strm->msg = msg */ 1220 1.1 christos .L_skip_msg: 1221 1.1 christos movl state_strm(%eax), %eax /* state = strm->state */ 1222 1.1 christos movl %edx, mode_state(%eax) /* state->mode = edx (BAD | TYPE) */ 1223 1.1 christos jmp .L_break_loop 1224 1.1 christos 1225 1.1 christos .align 32,0x90 1226 1.1 christos .L_break_loop: 1227 1.1 christos 1228 1.1 christos /* 1229 1.1 christos * Regs: 1230 1.1 christos * 1231 1.1 christos * bits = %ebp when mmx, and in %ebx when non-mmx 1232 1.1 christos * hold = %hold_mm when mmx, and in %ebp when non-mmx 1233 1.1 christos * in = %esi 1234 1.1 christos * out = %edi 1235 1.1 christos */ 1236 1.1 christos 1237 1.1 christos #if defined( USE_MMX ) || defined( RUN_TIME_MMX ) 1238 1.1 christos 1239 1.1 christos #if defined( RUN_TIME_MMX ) 1240 1.1 christos 1241 1.1 christos cmpl $DO_USE_MMX, inflate_fast_use_mmx 1242 1.1 christos jne .L_update_next_in 1243 1.1 christos 1244 1.1 christos #endif /* RUN_TIME_MMX */ 1245 1.1 christos 1246 1.1 christos movl %ebp, %ebx 1247 1.1 christos 1248 1.1 christos .L_update_next_in: 1249 1.1 christos 1250 1.1 christos #endif 1251 1.1 christos 1252 1.1 christos #define strm_r %eax 1253 1.1 christos #define state_r %edx 1254 1.1 christos 1255 1.1 christos /* len = bits >> 3; 1256 1.1 christos * in -= len; 1257 1.1 christos * bits -= len << 3; 1258 1.1 christos * hold &= (1U << bits) - 1; 1259 1.1 christos * state->hold = hold; 1260 1.1 christos * state->bits = bits; 1261 1.1 christos * strm->next_in = in; 1262 1.1 christos * strm->next_out = out; 1263 1.1 christos */ 1264 1.1 christos movl strm_sp(%esp), strm_r 1265 1.1 christos movl %ebx, %ecx 1266 1.1 christos movl state_strm(strm_r), state_r 1267 1.1 christos shrl $3, %ecx 1268 1.1 christos subl %ecx, in_r 1269 1.1 christos shll $3, %ecx 1270 1.1 christos subl %ecx, %ebx 1271 1.1 christos movl out_r, next_out_strm(strm_r) 1272 1.1 christos movl %ebx, bits_state(state_r) 1273 1.1 christos movl %ebx, %ecx 1274 1.1 christos 1275 1.1 christos leal buf(%esp), %ebx 1276 1.1 christos cmpl %ebx, last(%esp) 1277 1.1 christos jne .L_buf_not_used /* if buf != last */ 1278 1.1 christos 1279 1.1 christos subl %ebx, in_r /* in -= buf */ 1280 1.1 christos movl next_in_strm(strm_r), %ebx 1281 1.1 christos movl %ebx, last(%esp) /* last = strm->next_in */ 1282 1.1 christos addl %ebx, in_r /* in += strm->next_in */ 1283 1.1 christos movl avail_in_strm(strm_r), %ebx 1284 1.1 christos subl $11, %ebx 1285 1.1 christos addl %ebx, last(%esp) /* last = &strm->next_in[ avail_in - 11 ] */ 1286 1.1 christos 1287 1.1 christos .L_buf_not_used: 1288 1.1 christos movl in_r, next_in_strm(strm_r) 1289 1.1 christos 1290 1.1 christos movl $1, %ebx 1291 1.1 christos shll %cl, %ebx 1292 1.1 christos decl %ebx 1293 1.1 christos 1294 1.1 christos #if defined( USE_MMX ) || defined( RUN_TIME_MMX ) 1295 1.1 christos 1296 1.1 christos #if defined( RUN_TIME_MMX ) 1297 1.1 christos 1298 1.1 christos cmpl $DO_USE_MMX, inflate_fast_use_mmx 1299 1.1 christos jne .L_update_hold 1300 1.1 christos 1301 1.1 christos #endif /* RUN_TIME_MMX */ 1302 1.1 christos 1303 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */ 1304 1.1 christos movd hold_mm, %ebp 1305 1.1 christos 1306 1.1 christos emms 1307 1.1 christos 1308 1.1 christos .L_update_hold: 1309 1.1 christos 1310 1.1 christos #endif /* USE_MMX || RUN_TIME_MMX */ 1311 1.1 christos 1312 1.1 christos andl %ebx, %ebp 1313 1.1 christos movl %ebp, hold_state(state_r) 1314 1.1 christos 1315 1.1 christos #define last_r %ebx 1316 1.1 christos 1317 1.1 christos /* strm->avail_in = in < last ? 11 + (last - in) : 11 - (in - last) */ 1318 1.1 christos movl last(%esp), last_r 1319 1.1 christos cmpl in_r, last_r 1320 1.1 christos jbe .L_last_is_smaller /* if (in >= last) */ 1321 1.1 christos 1322 1.1 christos subl in_r, last_r /* last -= in */ 1323 1.1 christos addl $11, last_r /* last += 11 */ 1324 1.1 christos movl last_r, avail_in_strm(strm_r) 1325 1.1 christos jmp .L_fixup_out 1326 1.1 christos .L_last_is_smaller: 1327 1.1 christos subl last_r, in_r /* in -= last */ 1328 1.1 christos negl in_r /* in = -in */ 1329 1.1 christos addl $11, in_r /* in += 11 */ 1330 1.1 christos movl in_r, avail_in_strm(strm_r) 1331 1.1 christos 1332 1.1 christos #undef last_r 1333 1.1 christos #define end_r %ebx 1334 1.1 christos 1335 1.1 christos .L_fixup_out: 1336 1.1 christos /* strm->avail_out = out < end ? 257 + (end - out) : 257 - (out - end)*/ 1337 1.1 christos movl end(%esp), end_r 1338 1.1 christos cmpl out_r, end_r 1339 1.1 christos jbe .L_end_is_smaller /* if (out >= end) */ 1340 1.1 christos 1341 1.1 christos subl out_r, end_r /* end -= out */ 1342 1.1 christos addl $257, end_r /* end += 257 */ 1343 1.1 christos movl end_r, avail_out_strm(strm_r) 1344 1.1 christos jmp .L_done 1345 1.1 christos .L_end_is_smaller: 1346 1.1 christos subl end_r, out_r /* out -= end */ 1347 1.1 christos negl out_r /* out = -out */ 1348 1.1 christos addl $257, out_r /* out += 257 */ 1349 1.1 christos movl out_r, avail_out_strm(strm_r) 1350 1.1 christos 1351 1.1 christos #undef end_r 1352 1.1 christos #undef strm_r 1353 1.1 christos #undef state_r 1354 1.1 christos 1355 1.1 christos .L_done: 1356 1.1 christos addl $local_var_size, %esp 1357 1.1 christos popf 1358 1.1 christos popl %ebx 1359 1.1 christos popl %ebp 1360 1.1 christos popl %esi 1361 1.1 christos popl %edi 1362 1.1 christos ret 1363 1.1 christos 1364 1.1 christos #if defined( GAS_ELF ) 1365 1.1 christos /* elf info */ 1366 1.1 christos .type inflate_fast,@function 1367 1.1 christos .size inflate_fast,.-inflate_fast 1368 1.1 christos #endif 1369