inffast.S revision 1.1 1 1.1 christos /*
2 1.1 christos * inffast.S is a hand tuned assembler version of:
3 1.1 christos *
4 1.1 christos * inffast.c -- fast decoding
5 1.1 christos * Copyright (C) 1995-2003 Mark Adler
6 1.1 christos * For conditions of distribution and use, see copyright notice in zlib.h
7 1.1 christos *
8 1.1 christos * Copyright (C) 2003 Chris Anderson <christop (at) charm.net>
9 1.1 christos * Please use the copyright conditions above.
10 1.1 christos *
11 1.1 christos * This version (Jan-23-2003) of inflate_fast was coded and tested under
12 1.1 christos * GNU/Linux on a pentium 3, using the gcc-3.2 compiler distribution. On that
13 1.1 christos * machine, I found that gzip style archives decompressed about 20% faster than
14 1.1 christos * the gcc-3.2 -O3 -fomit-frame-pointer compiled version. Your results will
15 1.1 christos * depend on how large of a buffer is used for z_stream.next_in & next_out
16 1.1 christos * (8K-32K worked best for my 256K cpu cache) and how much overhead there is in
17 1.1 christos * stream processing I/O and crc32/addler32. In my case, this routine used
18 1.1 christos * 70% of the cpu time and crc32 used 20%.
19 1.1 christos *
20 1.1 christos * I am confident that this version will work in the general case, but I have
21 1.1 christos * not tested a wide variety of datasets or a wide variety of platforms.
22 1.1 christos *
23 1.1 christos * Jan-24-2003 -- Added -DUSE_MMX define for slightly faster inflating.
24 1.1 christos * It should be a runtime flag instead of compile time flag...
25 1.1 christos *
26 1.1 christos * Jan-26-2003 -- Added runtime check for MMX support with cpuid instruction.
27 1.1 christos * With -DUSE_MMX, only MMX code is compiled. With -DNO_MMX, only non-MMX code
28 1.1 christos * is compiled. Without either option, runtime detection is enabled. Runtime
29 1.1 christos * detection should work on all modern cpus and the recomended algorithm (flip
30 1.1 christos * ID bit on eflags and then use the cpuid instruction) is used in many
31 1.1 christos * multimedia applications. Tested under win2k with gcc-2.95 and gas-2.12
32 1.1 christos * distributed with cygwin3. Compiling with gcc-2.95 -c inffast.S -o
33 1.1 christos * inffast.obj generates a COFF object which can then be linked with MSVC++
34 1.1 christos * compiled code. Tested under FreeBSD 4.7 with gcc-2.95.
35 1.1 christos *
36 1.1 christos * Jan-28-2003 -- Tested Athlon XP... MMX mode is slower than no MMX (and
37 1.1 christos * slower than compiler generated code). Adjusted cpuid check to use the MMX
38 1.1 christos * code only for Pentiums < P4 until I have more data on the P4. Speed
39 1.1 christos * improvment is only about 15% on the Athlon when compared with code generated
40 1.1 christos * with MSVC++. Not sure yet, but I think the P4 will also be slower using the
41 1.1 christos * MMX mode because many of it's x86 ALU instructions execute in .5 cycles and
42 1.1 christos * have less latency than MMX ops. Added code to buffer the last 11 bytes of
43 1.1 christos * the input stream since the MMX code grabs bits in chunks of 32, which
44 1.1 christos * differs from the inffast.c algorithm. I don't think there would have been
45 1.1 christos * read overruns where a page boundary was crossed (a segfault), but there
46 1.1 christos * could have been overruns when next_in ends on unaligned memory (unintialized
47 1.1 christos * memory read).
48 1.1 christos *
49 1.1 christos * Mar-13-2003 -- P4 MMX is slightly slower than P4 NO_MMX. I created a C
50 1.1 christos * version of the non-MMX code so that it doesn't depend on zstrm and zstate
51 1.1 christos * structure offsets which are hard coded in this file. This was last tested
52 1.1 christos * with zlib-1.2.0 which is currently in beta testing, newer versions of this
53 1.1 christos * and inffas86.c can be found at http://www.eetbeetee.com/zlib/ and
54 1.1 christos * http://www.charm.net/~christop/zlib/
55 1.1 christos */
56 1.1 christos
57 1.1 christos
58 1.1 christos /*
59 1.1 christos * if you have underscore linking problems (_inflate_fast undefined), try
60 1.1 christos * using -DGAS_COFF
61 1.1 christos */
62 1.1 christos #if ! defined( GAS_COFF ) && ! defined( GAS_ELF )
63 1.1 christos
64 1.1 christos #if defined( WIN32 ) || defined( __CYGWIN__ )
65 1.1 christos #define GAS_COFF /* windows object format */
66 1.1 christos #else
67 1.1 christos #define GAS_ELF
68 1.1 christos #endif
69 1.1 christos
70 1.1 christos #endif /* ! GAS_COFF && ! GAS_ELF */
71 1.1 christos
72 1.1 christos
73 1.1 christos #if defined( GAS_COFF )
74 1.1 christos
75 1.1 christos /* coff externals have underscores */
76 1.1 christos #define inflate_fast _inflate_fast
77 1.1 christos #define inflate_fast_use_mmx _inflate_fast_use_mmx
78 1.1 christos
79 1.1 christos #endif /* GAS_COFF */
80 1.1 christos
81 1.1 christos
82 1.1 christos .file "inffast.S"
83 1.1 christos
84 1.1 christos .globl inflate_fast
85 1.1 christos
86 1.1 christos .text
87 1.1 christos .align 4,0
88 1.1 christos .L_invalid_literal_length_code_msg:
89 1.1 christos .string "invalid literal/length code"
90 1.1 christos
91 1.1 christos .align 4,0
92 1.1 christos .L_invalid_distance_code_msg:
93 1.1 christos .string "invalid distance code"
94 1.1 christos
95 1.1 christos .align 4,0
96 1.1 christos .L_invalid_distance_too_far_msg:
97 1.1 christos .string "invalid distance too far back"
98 1.1 christos
99 1.1 christos #if ! defined( NO_MMX )
100 1.1 christos .align 4,0
101 1.1 christos .L_mask: /* mask[N] = ( 1 << N ) - 1 */
102 1.1 christos .long 0
103 1.1 christos .long 1
104 1.1 christos .long 3
105 1.1 christos .long 7
106 1.1 christos .long 15
107 1.1 christos .long 31
108 1.1 christos .long 63
109 1.1 christos .long 127
110 1.1 christos .long 255
111 1.1 christos .long 511
112 1.1 christos .long 1023
113 1.1 christos .long 2047
114 1.1 christos .long 4095
115 1.1 christos .long 8191
116 1.1 christos .long 16383
117 1.1 christos .long 32767
118 1.1 christos .long 65535
119 1.1 christos .long 131071
120 1.1 christos .long 262143
121 1.1 christos .long 524287
122 1.1 christos .long 1048575
123 1.1 christos .long 2097151
124 1.1 christos .long 4194303
125 1.1 christos .long 8388607
126 1.1 christos .long 16777215
127 1.1 christos .long 33554431
128 1.1 christos .long 67108863
129 1.1 christos .long 134217727
130 1.1 christos .long 268435455
131 1.1 christos .long 536870911
132 1.1 christos .long 1073741823
133 1.1 christos .long 2147483647
134 1.1 christos .long 4294967295
135 1.1 christos #endif /* NO_MMX */
136 1.1 christos
137 1.1 christos .text
138 1.1 christos
139 1.1 christos /*
140 1.1 christos * struct z_stream offsets, in zlib.h
141 1.1 christos */
142 1.1 christos #define next_in_strm 0 /* strm->next_in */
143 1.1 christos #define avail_in_strm 4 /* strm->avail_in */
144 1.1 christos #define next_out_strm 12 /* strm->next_out */
145 1.1 christos #define avail_out_strm 16 /* strm->avail_out */
146 1.1 christos #define msg_strm 24 /* strm->msg */
147 1.1 christos #define state_strm 28 /* strm->state */
148 1.1 christos
149 1.1 christos /*
150 1.1 christos * struct inflate_state offsets, in inflate.h
151 1.1 christos */
152 1.1 christos #define mode_state 0 /* state->mode */
153 1.1 christos #define wsize_state 32 /* state->wsize */
154 1.1 christos #define write_state 40 /* state->write */
155 1.1 christos #define window_state 44 /* state->window */
156 1.1 christos #define hold_state 48 /* state->hold */
157 1.1 christos #define bits_state 52 /* state->bits */
158 1.1 christos #define lencode_state 68 /* state->lencode */
159 1.1 christos #define distcode_state 72 /* state->distcode */
160 1.1 christos #define lenbits_state 76 /* state->lenbits */
161 1.1 christos #define distbits_state 80 /* state->distbits */
162 1.1 christos
163 1.1 christos /*
164 1.1 christos * inflate_fast's activation record
165 1.1 christos */
166 1.1 christos #define local_var_size 64 /* how much local space for vars */
167 1.1 christos #define strm_sp 88 /* first arg: z_stream * (local_var_size + 24) */
168 1.1 christos #define start_sp 92 /* second arg: unsigned int (local_var_size + 28) */
169 1.1 christos
170 1.1 christos /*
171 1.1 christos * offsets for local vars on stack
172 1.1 christos */
173 1.1 christos #define out 60 /* unsigned char* */
174 1.1 christos #define window 56 /* unsigned char* */
175 1.1 christos #define wsize 52 /* unsigned int */
176 1.1 christos #define write 48 /* unsigned int */
177 1.1 christos #define in 44 /* unsigned char* */
178 1.1 christos #define beg 40 /* unsigned char* */
179 1.1 christos #define buf 28 /* char[ 12 ] */
180 1.1 christos #define len 24 /* unsigned int */
181 1.1 christos #define last 20 /* unsigned char* */
182 1.1 christos #define end 16 /* unsigned char* */
183 1.1 christos #define dcode 12 /* code* */
184 1.1 christos #define lcode 8 /* code* */
185 1.1 christos #define dmask 4 /* unsigned int */
186 1.1 christos #define lmask 0 /* unsigned int */
187 1.1 christos
188 1.1 christos /*
189 1.1 christos * typedef enum inflate_mode consts, in inflate.h
190 1.1 christos */
191 1.1 christos #define INFLATE_MODE_TYPE 11 /* state->mode flags enum-ed in inflate.h */
192 1.1 christos #define INFLATE_MODE_BAD 26
193 1.1 christos
194 1.1 christos
195 1.1 christos #if ! defined( USE_MMX ) && ! defined( NO_MMX )
196 1.1 christos
197 1.1 christos #define RUN_TIME_MMX
198 1.1 christos
199 1.1 christos #define CHECK_MMX 1
200 1.1 christos #define DO_USE_MMX 2
201 1.1 christos #define DONT_USE_MMX 3
202 1.1 christos
203 1.1 christos .globl inflate_fast_use_mmx
204 1.1 christos
205 1.1 christos .data
206 1.1 christos
207 1.1 christos .align 4,0
208 1.1 christos inflate_fast_use_mmx: /* integer flag for run time control 1=check,2=mmx,3=no */
209 1.1 christos .long CHECK_MMX
210 1.1 christos
211 1.1 christos #if defined( GAS_ELF )
212 1.1 christos /* elf info */
213 1.1 christos .type inflate_fast_use_mmx,@object
214 1.1 christos .size inflate_fast_use_mmx,4
215 1.1 christos #endif
216 1.1 christos
217 1.1 christos #endif /* RUN_TIME_MMX */
218 1.1 christos
219 1.1 christos #if defined( GAS_COFF )
220 1.1 christos /* coff info: scl 2 = extern, type 32 = function */
221 1.1 christos .def inflate_fast; .scl 2; .type 32; .endef
222 1.1 christos #endif
223 1.1 christos
224 1.1 christos .text
225 1.1 christos
226 1.1 christos .align 32,0x90
227 1.1 christos inflate_fast:
228 1.1 christos pushl %edi
229 1.1 christos pushl %esi
230 1.1 christos pushl %ebp
231 1.1 christos pushl %ebx
232 1.1 christos pushf /* save eflags (strm_sp, state_sp assumes this is 32 bits) */
233 1.1 christos subl $local_var_size, %esp
234 1.1 christos cld
235 1.1 christos
236 1.1 christos #define strm_r %esi
237 1.1 christos #define state_r %edi
238 1.1 christos
239 1.1 christos movl strm_sp(%esp), strm_r
240 1.1 christos movl state_strm(strm_r), state_r
241 1.1 christos
242 1.1 christos /* in = strm->next_in;
243 1.1 christos * out = strm->next_out;
244 1.1 christos * last = in + strm->avail_in - 11;
245 1.1 christos * beg = out - (start - strm->avail_out);
246 1.1 christos * end = out + (strm->avail_out - 257);
247 1.1 christos */
248 1.1 christos movl avail_in_strm(strm_r), %edx
249 1.1 christos movl next_in_strm(strm_r), %eax
250 1.1 christos
251 1.1 christos addl %eax, %edx /* avail_in += next_in */
252 1.1 christos subl $11, %edx /* avail_in -= 11 */
253 1.1 christos
254 1.1 christos movl %eax, in(%esp)
255 1.1 christos movl %edx, last(%esp)
256 1.1 christos
257 1.1 christos movl start_sp(%esp), %ebp
258 1.1 christos movl avail_out_strm(strm_r), %ecx
259 1.1 christos movl next_out_strm(strm_r), %ebx
260 1.1 christos
261 1.1 christos subl %ecx, %ebp /* start -= avail_out */
262 1.1 christos negl %ebp /* start = -start */
263 1.1 christos addl %ebx, %ebp /* start += next_out */
264 1.1 christos
265 1.1 christos subl $257, %ecx /* avail_out -= 257 */
266 1.1 christos addl %ebx, %ecx /* avail_out += out */
267 1.1 christos
268 1.1 christos movl %ebx, out(%esp)
269 1.1 christos movl %ebp, beg(%esp)
270 1.1 christos movl %ecx, end(%esp)
271 1.1 christos
272 1.1 christos /* wsize = state->wsize;
273 1.1 christos * write = state->write;
274 1.1 christos * window = state->window;
275 1.1 christos * hold = state->hold;
276 1.1 christos * bits = state->bits;
277 1.1 christos * lcode = state->lencode;
278 1.1 christos * dcode = state->distcode;
279 1.1 christos * lmask = ( 1 << state->lenbits ) - 1;
280 1.1 christos * dmask = ( 1 << state->distbits ) - 1;
281 1.1 christos */
282 1.1 christos
283 1.1 christos movl lencode_state(state_r), %eax
284 1.1 christos movl distcode_state(state_r), %ecx
285 1.1 christos
286 1.1 christos movl %eax, lcode(%esp)
287 1.1 christos movl %ecx, dcode(%esp)
288 1.1 christos
289 1.1 christos movl $1, %eax
290 1.1 christos movl lenbits_state(state_r), %ecx
291 1.1 christos shll %cl, %eax
292 1.1 christos decl %eax
293 1.1 christos movl %eax, lmask(%esp)
294 1.1 christos
295 1.1 christos movl $1, %eax
296 1.1 christos movl distbits_state(state_r), %ecx
297 1.1 christos shll %cl, %eax
298 1.1 christos decl %eax
299 1.1 christos movl %eax, dmask(%esp)
300 1.1 christos
301 1.1 christos movl wsize_state(state_r), %eax
302 1.1 christos movl write_state(state_r), %ecx
303 1.1 christos movl window_state(state_r), %edx
304 1.1 christos
305 1.1 christos movl %eax, wsize(%esp)
306 1.1 christos movl %ecx, write(%esp)
307 1.1 christos movl %edx, window(%esp)
308 1.1 christos
309 1.1 christos movl hold_state(state_r), %ebp
310 1.1 christos movl bits_state(state_r), %ebx
311 1.1 christos
312 1.1 christos #undef strm_r
313 1.1 christos #undef state_r
314 1.1 christos
315 1.1 christos #define in_r %esi
316 1.1 christos #define from_r %esi
317 1.1 christos #define out_r %edi
318 1.1 christos
319 1.1 christos movl in(%esp), in_r
320 1.1 christos movl last(%esp), %ecx
321 1.1 christos cmpl in_r, %ecx
322 1.1 christos ja .L_align_long /* if in < last */
323 1.1 christos
324 1.1 christos addl $11, %ecx /* ecx = &in[ avail_in ] */
325 1.1 christos subl in_r, %ecx /* ecx = avail_in */
326 1.1 christos movl $12, %eax
327 1.1 christos subl %ecx, %eax /* eax = 12 - avail_in */
328 1.1 christos leal buf(%esp), %edi
329 1.1 christos rep movsb /* memcpy( buf, in, avail_in ) */
330 1.1 christos movl %eax, %ecx
331 1.1 christos xorl %eax, %eax
332 1.1 christos rep stosb /* memset( &buf[ avail_in ], 0, 12 - avail_in ) */
333 1.1 christos leal buf(%esp), in_r /* in = buf */
334 1.1 christos movl in_r, last(%esp) /* last = in, do just one iteration */
335 1.1 christos jmp .L_is_aligned
336 1.1 christos
337 1.1 christos /* align in_r on long boundary */
338 1.1 christos .L_align_long:
339 1.1 christos testl $3, in_r
340 1.1 christos jz .L_is_aligned
341 1.1 christos xorl %eax, %eax
342 1.1 christos movb (in_r), %al
343 1.1 christos incl in_r
344 1.1 christos movl %ebx, %ecx
345 1.1 christos addl $8, %ebx
346 1.1 christos shll %cl, %eax
347 1.1 christos orl %eax, %ebp
348 1.1 christos jmp .L_align_long
349 1.1 christos
350 1.1 christos .L_is_aligned:
351 1.1 christos movl out(%esp), out_r
352 1.1 christos
353 1.1 christos #if defined( NO_MMX )
354 1.1 christos jmp .L_do_loop
355 1.1 christos #endif
356 1.1 christos
357 1.1 christos #if defined( USE_MMX )
358 1.1 christos jmp .L_init_mmx
359 1.1 christos #endif
360 1.1 christos
361 1.1 christos /*** Runtime MMX check ***/
362 1.1 christos
363 1.1 christos #if defined( RUN_TIME_MMX )
364 1.1 christos .L_check_mmx:
365 1.1 christos cmpl $DO_USE_MMX, inflate_fast_use_mmx
366 1.1 christos je .L_init_mmx
367 1.1 christos ja .L_do_loop /* > 2 */
368 1.1 christos
369 1.1 christos pushl %eax
370 1.1 christos pushl %ebx
371 1.1 christos pushl %ecx
372 1.1 christos pushl %edx
373 1.1 christos pushf
374 1.1 christos movl (%esp), %eax /* copy eflags to eax */
375 1.1 christos xorl $0x200000, (%esp) /* try toggling ID bit of eflags (bit 21)
376 1.1 christos * to see if cpu supports cpuid...
377 1.1 christos * ID bit method not supported by NexGen but
378 1.1 christos * bios may load a cpuid instruction and
379 1.1 christos * cpuid may be disabled on Cyrix 5-6x86 */
380 1.1 christos popf
381 1.1 christos pushf
382 1.1 christos popl %edx /* copy new eflags to edx */
383 1.1 christos xorl %eax, %edx /* test if ID bit is flipped */
384 1.1 christos jz .L_dont_use_mmx /* not flipped if zero */
385 1.1 christos xorl %eax, %eax
386 1.1 christos cpuid
387 1.1 christos cmpl $0x756e6547, %ebx /* check for GenuineIntel in ebx,ecx,edx */
388 1.1 christos jne .L_dont_use_mmx
389 1.1 christos cmpl $0x6c65746e, %ecx
390 1.1 christos jne .L_dont_use_mmx
391 1.1 christos cmpl $0x49656e69, %edx
392 1.1 christos jne .L_dont_use_mmx
393 1.1 christos movl $1, %eax
394 1.1 christos cpuid /* get cpu features */
395 1.1 christos shrl $8, %eax
396 1.1 christos andl $15, %eax
397 1.1 christos cmpl $6, %eax /* check for Pentium family, is 0xf for P4 */
398 1.1 christos jne .L_dont_use_mmx
399 1.1 christos testl $0x800000, %edx /* test if MMX feature is set (bit 23) */
400 1.1 christos jnz .L_use_mmx
401 1.1 christos jmp .L_dont_use_mmx
402 1.1 christos .L_use_mmx:
403 1.1 christos movl $DO_USE_MMX, inflate_fast_use_mmx
404 1.1 christos jmp .L_check_mmx_pop
405 1.1 christos .L_dont_use_mmx:
406 1.1 christos movl $DONT_USE_MMX, inflate_fast_use_mmx
407 1.1 christos .L_check_mmx_pop:
408 1.1 christos popl %edx
409 1.1 christos popl %ecx
410 1.1 christos popl %ebx
411 1.1 christos popl %eax
412 1.1 christos jmp .L_check_mmx
413 1.1 christos #endif
414 1.1 christos
415 1.1 christos
416 1.1 christos /*** Non-MMX code ***/
417 1.1 christos
418 1.1 christos #if defined ( NO_MMX ) || defined( RUN_TIME_MMX )
419 1.1 christos
420 1.1 christos #define hold_r %ebp
421 1.1 christos #define bits_r %bl
422 1.1 christos #define bitslong_r %ebx
423 1.1 christos
424 1.1 christos .align 32,0x90
425 1.1 christos .L_while_test:
426 1.1 christos /* while (in < last && out < end)
427 1.1 christos */
428 1.1 christos cmpl out_r, end(%esp)
429 1.1 christos jbe .L_break_loop /* if (out >= end) */
430 1.1 christos
431 1.1 christos cmpl in_r, last(%esp)
432 1.1 christos jbe .L_break_loop
433 1.1 christos
434 1.1 christos .L_do_loop:
435 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out
436 1.1 christos *
437 1.1 christos * do {
438 1.1 christos * if (bits < 15) {
439 1.1 christos * hold |= *((unsigned short *)in)++ << bits;
440 1.1 christos * bits += 16
441 1.1 christos * }
442 1.1 christos * this = lcode[hold & lmask]
443 1.1 christos */
444 1.1 christos cmpb $15, bits_r
445 1.1 christos ja .L_get_length_code /* if (15 < bits) */
446 1.1 christos
447 1.1 christos xorl %eax, %eax
448 1.1 christos lodsw /* al = *(ushort *)in++ */
449 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */
450 1.1 christos addb $16, bits_r /* bits += 16 */
451 1.1 christos shll %cl, %eax
452 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */
453 1.1 christos
454 1.1 christos .L_get_length_code:
455 1.1 christos movl lmask(%esp), %edx /* edx = lmask */
456 1.1 christos movl lcode(%esp), %ecx /* ecx = lcode */
457 1.1 christos andl hold_r, %edx /* edx &= hold */
458 1.1 christos movl (%ecx,%edx,4), %eax /* eax = lcode[hold & lmask] */
459 1.1 christos
460 1.1 christos .L_dolen:
461 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out
462 1.1 christos *
463 1.1 christos * dolen:
464 1.1 christos * bits -= this.bits;
465 1.1 christos * hold >>= this.bits
466 1.1 christos */
467 1.1 christos movb %ah, %cl /* cl = this.bits */
468 1.1 christos subb %ah, bits_r /* bits -= this.bits */
469 1.1 christos shrl %cl, hold_r /* hold >>= this.bits */
470 1.1 christos
471 1.1 christos /* check if op is a literal
472 1.1 christos * if (op == 0) {
473 1.1 christos * PUP(out) = this.val;
474 1.1 christos * }
475 1.1 christos */
476 1.1 christos testb %al, %al
477 1.1 christos jnz .L_test_for_length_base /* if (op != 0) 45.7% */
478 1.1 christos
479 1.1 christos shrl $16, %eax /* output this.val char */
480 1.1 christos stosb
481 1.1 christos jmp .L_while_test
482 1.1 christos
483 1.1 christos .L_test_for_length_base:
484 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out, %edx = len
485 1.1 christos *
486 1.1 christos * else if (op & 16) {
487 1.1 christos * len = this.val
488 1.1 christos * op &= 15
489 1.1 christos * if (op) {
490 1.1 christos * if (op > bits) {
491 1.1 christos * hold |= *((unsigned short *)in)++ << bits;
492 1.1 christos * bits += 16
493 1.1 christos * }
494 1.1 christos * len += hold & mask[op];
495 1.1 christos * bits -= op;
496 1.1 christos * hold >>= op;
497 1.1 christos * }
498 1.1 christos */
499 1.1 christos #define len_r %edx
500 1.1 christos movl %eax, len_r /* len = this */
501 1.1 christos shrl $16, len_r /* len = this.val */
502 1.1 christos movb %al, %cl
503 1.1 christos
504 1.1 christos testb $16, %al
505 1.1 christos jz .L_test_for_second_level_length /* if ((op & 16) == 0) 8% */
506 1.1 christos andb $15, %cl /* op &= 15 */
507 1.1 christos jz .L_save_len /* if (!op) */
508 1.1 christos cmpb %cl, bits_r
509 1.1 christos jae .L_add_bits_to_len /* if (op <= bits) */
510 1.1 christos
511 1.1 christos movb %cl, %ch /* stash op in ch, freeing cl */
512 1.1 christos xorl %eax, %eax
513 1.1 christos lodsw /* al = *(ushort *)in++ */
514 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */
515 1.1 christos addb $16, bits_r /* bits += 16 */
516 1.1 christos shll %cl, %eax
517 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */
518 1.1 christos movb %ch, %cl /* move op back to ecx */
519 1.1 christos
520 1.1 christos .L_add_bits_to_len:
521 1.1 christos movl $1, %eax
522 1.1 christos shll %cl, %eax
523 1.1 christos decl %eax
524 1.1 christos subb %cl, bits_r
525 1.1 christos andl hold_r, %eax /* eax &= hold */
526 1.1 christos shrl %cl, hold_r
527 1.1 christos addl %eax, len_r /* len += hold & mask[op] */
528 1.1 christos
529 1.1 christos .L_save_len:
530 1.1 christos movl len_r, len(%esp) /* save len */
531 1.1 christos #undef len_r
532 1.1 christos
533 1.1 christos .L_decode_distance:
534 1.1 christos /* regs: %esi = in, %ebp = hold, %bl = bits, %edi = out, %edx = dist
535 1.1 christos *
536 1.1 christos * if (bits < 15) {
537 1.1 christos * hold |= *((unsigned short *)in)++ << bits;
538 1.1 christos * bits += 16
539 1.1 christos * }
540 1.1 christos * this = dcode[hold & dmask];
541 1.1 christos * dodist:
542 1.1 christos * bits -= this.bits;
543 1.1 christos * hold >>= this.bits;
544 1.1 christos * op = this.op;
545 1.1 christos */
546 1.1 christos
547 1.1 christos cmpb $15, bits_r
548 1.1 christos ja .L_get_distance_code /* if (15 < bits) */
549 1.1 christos
550 1.1 christos xorl %eax, %eax
551 1.1 christos lodsw /* al = *(ushort *)in++ */
552 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */
553 1.1 christos addb $16, bits_r /* bits += 16 */
554 1.1 christos shll %cl, %eax
555 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */
556 1.1 christos
557 1.1 christos .L_get_distance_code:
558 1.1 christos movl dmask(%esp), %edx /* edx = dmask */
559 1.1 christos movl dcode(%esp), %ecx /* ecx = dcode */
560 1.1 christos andl hold_r, %edx /* edx &= hold */
561 1.1 christos movl (%ecx,%edx,4), %eax /* eax = dcode[hold & dmask] */
562 1.1 christos
563 1.1 christos #define dist_r %edx
564 1.1 christos .L_dodist:
565 1.1 christos movl %eax, dist_r /* dist = this */
566 1.1 christos shrl $16, dist_r /* dist = this.val */
567 1.1 christos movb %ah, %cl
568 1.1 christos subb %ah, bits_r /* bits -= this.bits */
569 1.1 christos shrl %cl, hold_r /* hold >>= this.bits */
570 1.1 christos
571 1.1 christos /* if (op & 16) {
572 1.1 christos * dist = this.val
573 1.1 christos * op &= 15
574 1.1 christos * if (op > bits) {
575 1.1 christos * hold |= *((unsigned short *)in)++ << bits;
576 1.1 christos * bits += 16
577 1.1 christos * }
578 1.1 christos * dist += hold & mask[op];
579 1.1 christos * bits -= op;
580 1.1 christos * hold >>= op;
581 1.1 christos */
582 1.1 christos movb %al, %cl /* cl = this.op */
583 1.1 christos
584 1.1 christos testb $16, %al /* if ((op & 16) == 0) */
585 1.1 christos jz .L_test_for_second_level_dist
586 1.1 christos andb $15, %cl /* op &= 15 */
587 1.1 christos jz .L_check_dist_one
588 1.1 christos cmpb %cl, bits_r
589 1.1 christos jae .L_add_bits_to_dist /* if (op <= bits) 97.6% */
590 1.1 christos
591 1.1 christos movb %cl, %ch /* stash op in ch, freeing cl */
592 1.1 christos xorl %eax, %eax
593 1.1 christos lodsw /* al = *(ushort *)in++ */
594 1.1 christos movb bits_r, %cl /* cl = bits, needs it for shifting */
595 1.1 christos addb $16, bits_r /* bits += 16 */
596 1.1 christos shll %cl, %eax
597 1.1 christos orl %eax, hold_r /* hold |= *((ushort *)in)++ << bits */
598 1.1 christos movb %ch, %cl /* move op back to ecx */
599 1.1 christos
600 1.1 christos .L_add_bits_to_dist:
601 1.1 christos movl $1, %eax
602 1.1 christos shll %cl, %eax
603 1.1 christos decl %eax /* (1 << op) - 1 */
604 1.1 christos subb %cl, bits_r
605 1.1 christos andl hold_r, %eax /* eax &= hold */
606 1.1 christos shrl %cl, hold_r
607 1.1 christos addl %eax, dist_r /* dist += hold & ((1 << op) - 1) */
608 1.1 christos jmp .L_check_window
609 1.1 christos
610 1.1 christos .L_check_window:
611 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist
612 1.1 christos * %ecx = nbytes
613 1.1 christos *
614 1.1 christos * nbytes = out - beg;
615 1.1 christos * if (dist <= nbytes) {
616 1.1 christos * from = out - dist;
617 1.1 christos * do {
618 1.1 christos * PUP(out) = PUP(from);
619 1.1 christos * } while (--len > 0) {
620 1.1 christos * }
621 1.1 christos */
622 1.1 christos
623 1.1 christos movl in_r, in(%esp) /* save in so from can use it's reg */
624 1.1 christos movl out_r, %eax
625 1.1 christos subl beg(%esp), %eax /* nbytes = out - beg */
626 1.1 christos
627 1.1 christos cmpl dist_r, %eax
628 1.1 christos jb .L_clip_window /* if (dist > nbytes) 4.2% */
629 1.1 christos
630 1.1 christos movl len(%esp), %ecx
631 1.1 christos movl out_r, from_r
632 1.1 christos subl dist_r, from_r /* from = out - dist */
633 1.1 christos
634 1.1 christos subl $3, %ecx
635 1.1 christos movb (from_r), %al
636 1.1 christos movb %al, (out_r)
637 1.1 christos movb 1(from_r), %al
638 1.1 christos movb 2(from_r), %dl
639 1.1 christos addl $3, from_r
640 1.1 christos movb %al, 1(out_r)
641 1.1 christos movb %dl, 2(out_r)
642 1.1 christos addl $3, out_r
643 1.1 christos rep movsb
644 1.1 christos
645 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */
646 1.1 christos jmp .L_while_test
647 1.1 christos
648 1.1 christos .align 16,0x90
649 1.1 christos .L_check_dist_one:
650 1.1 christos cmpl $1, dist_r
651 1.1 christos jne .L_check_window
652 1.1 christos cmpl out_r, beg(%esp)
653 1.1 christos je .L_check_window
654 1.1 christos
655 1.1 christos decl out_r
656 1.1 christos movl len(%esp), %ecx
657 1.1 christos movb (out_r), %al
658 1.1 christos subl $3, %ecx
659 1.1 christos
660 1.1 christos movb %al, 1(out_r)
661 1.1 christos movb %al, 2(out_r)
662 1.1 christos movb %al, 3(out_r)
663 1.1 christos addl $4, out_r
664 1.1 christos rep stosb
665 1.1 christos
666 1.1 christos jmp .L_while_test
667 1.1 christos
668 1.1 christos .align 16,0x90
669 1.1 christos .L_test_for_second_level_length:
670 1.1 christos /* else if ((op & 64) == 0) {
671 1.1 christos * this = lcode[this.val + (hold & mask[op])];
672 1.1 christos * }
673 1.1 christos */
674 1.1 christos testb $64, %al
675 1.1 christos jnz .L_test_for_end_of_block /* if ((op & 64) != 0) */
676 1.1 christos
677 1.1 christos movl $1, %eax
678 1.1 christos shll %cl, %eax
679 1.1 christos decl %eax
680 1.1 christos andl hold_r, %eax /* eax &= hold */
681 1.1 christos addl %edx, %eax /* eax += this.val */
682 1.1 christos movl lcode(%esp), %edx /* edx = lcode */
683 1.1 christos movl (%edx,%eax,4), %eax /* eax = lcode[val + (hold&mask[op])] */
684 1.1 christos jmp .L_dolen
685 1.1 christos
686 1.1 christos .align 16,0x90
687 1.1 christos .L_test_for_second_level_dist:
688 1.1 christos /* else if ((op & 64) == 0) {
689 1.1 christos * this = dcode[this.val + (hold & mask[op])];
690 1.1 christos * }
691 1.1 christos */
692 1.1 christos testb $64, %al
693 1.1 christos jnz .L_invalid_distance_code /* if ((op & 64) != 0) */
694 1.1 christos
695 1.1 christos movl $1, %eax
696 1.1 christos shll %cl, %eax
697 1.1 christos decl %eax
698 1.1 christos andl hold_r, %eax /* eax &= hold */
699 1.1 christos addl %edx, %eax /* eax += this.val */
700 1.1 christos movl dcode(%esp), %edx /* edx = dcode */
701 1.1 christos movl (%edx,%eax,4), %eax /* eax = dcode[val + (hold&mask[op])] */
702 1.1 christos jmp .L_dodist
703 1.1 christos
704 1.1 christos .align 16,0x90
705 1.1 christos .L_clip_window:
706 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist
707 1.1 christos * %ecx = nbytes
708 1.1 christos *
709 1.1 christos * else {
710 1.1 christos * if (dist > wsize) {
711 1.1 christos * invalid distance
712 1.1 christos * }
713 1.1 christos * from = window;
714 1.1 christos * nbytes = dist - nbytes;
715 1.1 christos * if (write == 0) {
716 1.1 christos * from += wsize - nbytes;
717 1.1 christos */
718 1.1 christos #define nbytes_r %ecx
719 1.1 christos movl %eax, nbytes_r
720 1.1 christos movl wsize(%esp), %eax /* prepare for dist compare */
721 1.1 christos negl nbytes_r /* nbytes = -nbytes */
722 1.1 christos movl window(%esp), from_r /* from = window */
723 1.1 christos
724 1.1 christos cmpl dist_r, %eax
725 1.1 christos jb .L_invalid_distance_too_far /* if (dist > wsize) */
726 1.1 christos
727 1.1 christos addl dist_r, nbytes_r /* nbytes = dist - nbytes */
728 1.1 christos cmpl $0, write(%esp)
729 1.1 christos jne .L_wrap_around_window /* if (write != 0) */
730 1.1 christos
731 1.1 christos subl nbytes_r, %eax
732 1.1 christos addl %eax, from_r /* from += wsize - nbytes */
733 1.1 christos
734 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist
735 1.1 christos * %ecx = nbytes, %eax = len
736 1.1 christos *
737 1.1 christos * if (nbytes < len) {
738 1.1 christos * len -= nbytes;
739 1.1 christos * do {
740 1.1 christos * PUP(out) = PUP(from);
741 1.1 christos * } while (--nbytes);
742 1.1 christos * from = out - dist;
743 1.1 christos * }
744 1.1 christos * }
745 1.1 christos */
746 1.1 christos #define len_r %eax
747 1.1 christos movl len(%esp), len_r
748 1.1 christos cmpl nbytes_r, len_r
749 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */
750 1.1 christos
751 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
752 1.1 christos rep movsb
753 1.1 christos movl out_r, from_r
754 1.1 christos subl dist_r, from_r /* from = out - dist */
755 1.1 christos jmp .L_do_copy1
756 1.1 christos
757 1.1 christos cmpl nbytes_r, len_r
758 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */
759 1.1 christos
760 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
761 1.1 christos rep movsb
762 1.1 christos movl out_r, from_r
763 1.1 christos subl dist_r, from_r /* from = out - dist */
764 1.1 christos jmp .L_do_copy1
765 1.1 christos
766 1.1 christos .L_wrap_around_window:
767 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist
768 1.1 christos * %ecx = nbytes, %eax = write, %eax = len
769 1.1 christos *
770 1.1 christos * else if (write < nbytes) {
771 1.1 christos * from += wsize + write - nbytes;
772 1.1 christos * nbytes -= write;
773 1.1 christos * if (nbytes < len) {
774 1.1 christos * len -= nbytes;
775 1.1 christos * do {
776 1.1 christos * PUP(out) = PUP(from);
777 1.1 christos * } while (--nbytes);
778 1.1 christos * from = window;
779 1.1 christos * nbytes = write;
780 1.1 christos * if (nbytes < len) {
781 1.1 christos * len -= nbytes;
782 1.1 christos * do {
783 1.1 christos * PUP(out) = PUP(from);
784 1.1 christos * } while(--nbytes);
785 1.1 christos * from = out - dist;
786 1.1 christos * }
787 1.1 christos * }
788 1.1 christos * }
789 1.1 christos */
790 1.1 christos #define write_r %eax
791 1.1 christos movl write(%esp), write_r
792 1.1 christos cmpl write_r, nbytes_r
793 1.1 christos jbe .L_contiguous_in_window /* if (write >= nbytes) */
794 1.1 christos
795 1.1 christos addl wsize(%esp), from_r
796 1.1 christos addl write_r, from_r
797 1.1 christos subl nbytes_r, from_r /* from += wsize + write - nbytes */
798 1.1 christos subl write_r, nbytes_r /* nbytes -= write */
799 1.1 christos #undef write_r
800 1.1 christos
801 1.1 christos movl len(%esp), len_r
802 1.1 christos cmpl nbytes_r, len_r
803 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */
804 1.1 christos
805 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
806 1.1 christos rep movsb
807 1.1 christos movl window(%esp), from_r /* from = window */
808 1.1 christos movl write(%esp), nbytes_r /* nbytes = write */
809 1.1 christos cmpl nbytes_r, len_r
810 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */
811 1.1 christos
812 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
813 1.1 christos rep movsb
814 1.1 christos movl out_r, from_r
815 1.1 christos subl dist_r, from_r /* from = out - dist */
816 1.1 christos jmp .L_do_copy1
817 1.1 christos
818 1.1 christos .L_contiguous_in_window:
819 1.1 christos /* regs: %esi = from, %ebp = hold, %bl = bits, %edi = out, %edx = dist
820 1.1 christos * %ecx = nbytes, %eax = write, %eax = len
821 1.1 christos *
822 1.1 christos * else {
823 1.1 christos * from += write - nbytes;
824 1.1 christos * if (nbytes < len) {
825 1.1 christos * len -= nbytes;
826 1.1 christos * do {
827 1.1 christos * PUP(out) = PUP(from);
828 1.1 christos * } while (--nbytes);
829 1.1 christos * from = out - dist;
830 1.1 christos * }
831 1.1 christos * }
832 1.1 christos */
833 1.1 christos #define write_r %eax
834 1.1 christos addl write_r, from_r
835 1.1 christos subl nbytes_r, from_r /* from += write - nbytes */
836 1.1 christos #undef write_r
837 1.1 christos
838 1.1 christos movl len(%esp), len_r
839 1.1 christos cmpl nbytes_r, len_r
840 1.1 christos jbe .L_do_copy1 /* if (nbytes >= len) */
841 1.1 christos
842 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
843 1.1 christos rep movsb
844 1.1 christos movl out_r, from_r
845 1.1 christos subl dist_r, from_r /* from = out - dist */
846 1.1 christos
847 1.1 christos .L_do_copy1:
848 1.1 christos /* regs: %esi = from, %esi = in, %ebp = hold, %bl = bits, %edi = out
849 1.1 christos * %eax = len
850 1.1 christos *
851 1.1 christos * while (len > 0) {
852 1.1 christos * PUP(out) = PUP(from);
853 1.1 christos * len--;
854 1.1 christos * }
855 1.1 christos * }
856 1.1 christos * } while (in < last && out < end);
857 1.1 christos */
858 1.1 christos #undef nbytes_r
859 1.1 christos #define in_r %esi
860 1.1 christos movl len_r, %ecx
861 1.1 christos rep movsb
862 1.1 christos
863 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */
864 1.1 christos jmp .L_while_test
865 1.1 christos
866 1.1 christos #undef len_r
867 1.1 christos #undef dist_r
868 1.1 christos
869 1.1 christos #endif /* NO_MMX || RUN_TIME_MMX */
870 1.1 christos
871 1.1 christos
872 1.1 christos /*** MMX code ***/
873 1.1 christos
874 1.1 christos #if defined( USE_MMX ) || defined( RUN_TIME_MMX )
875 1.1 christos
876 1.1 christos .align 32,0x90
877 1.1 christos .L_init_mmx:
878 1.1 christos emms
879 1.1 christos
880 1.1 christos #undef bits_r
881 1.1 christos #undef bitslong_r
882 1.1 christos #define bitslong_r %ebp
883 1.1 christos #define hold_mm %mm0
884 1.1 christos movd %ebp, hold_mm
885 1.1 christos movl %ebx, bitslong_r
886 1.1 christos
887 1.1 christos #define used_mm %mm1
888 1.1 christos #define dmask2_mm %mm2
889 1.1 christos #define lmask2_mm %mm3
890 1.1 christos #define lmask_mm %mm4
891 1.1 christos #define dmask_mm %mm5
892 1.1 christos #define tmp_mm %mm6
893 1.1 christos
894 1.1 christos movd lmask(%esp), lmask_mm
895 1.1 christos movq lmask_mm, lmask2_mm
896 1.1 christos movd dmask(%esp), dmask_mm
897 1.1 christos movq dmask_mm, dmask2_mm
898 1.1 christos pxor used_mm, used_mm
899 1.1 christos movl lcode(%esp), %ebx /* ebx = lcode */
900 1.1 christos jmp .L_do_loop_mmx
901 1.1 christos
902 1.1 christos .align 32,0x90
903 1.1 christos .L_while_test_mmx:
904 1.1 christos /* while (in < last && out < end)
905 1.1 christos */
906 1.1 christos cmpl out_r, end(%esp)
907 1.1 christos jbe .L_break_loop /* if (out >= end) */
908 1.1 christos
909 1.1 christos cmpl in_r, last(%esp)
910 1.1 christos jbe .L_break_loop
911 1.1 christos
912 1.1 christos .L_do_loop_mmx:
913 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */
914 1.1 christos
915 1.1 christos cmpl $32, bitslong_r
916 1.1 christos ja .L_get_length_code_mmx /* if (32 < bits) */
917 1.1 christos
918 1.1 christos movd bitslong_r, tmp_mm
919 1.1 christos movd (in_r), %mm7
920 1.1 christos addl $4, in_r
921 1.1 christos psllq tmp_mm, %mm7
922 1.1 christos addl $32, bitslong_r
923 1.1 christos por %mm7, hold_mm /* hold_mm |= *((uint *)in)++ << bits */
924 1.1 christos
925 1.1 christos .L_get_length_code_mmx:
926 1.1 christos pand hold_mm, lmask_mm
927 1.1 christos movd lmask_mm, %eax
928 1.1 christos movq lmask2_mm, lmask_mm
929 1.1 christos movl (%ebx,%eax,4), %eax /* eax = lcode[hold & lmask] */
930 1.1 christos
931 1.1 christos .L_dolen_mmx:
932 1.1 christos movzbl %ah, %ecx /* ecx = this.bits */
933 1.1 christos movd %ecx, used_mm
934 1.1 christos subl %ecx, bitslong_r /* bits -= this.bits */
935 1.1 christos
936 1.1 christos testb %al, %al
937 1.1 christos jnz .L_test_for_length_base_mmx /* if (op != 0) 45.7% */
938 1.1 christos
939 1.1 christos shrl $16, %eax /* output this.val char */
940 1.1 christos stosb
941 1.1 christos jmp .L_while_test_mmx
942 1.1 christos
943 1.1 christos .L_test_for_length_base_mmx:
944 1.1 christos #define len_r %edx
945 1.1 christos movl %eax, len_r /* len = this */
946 1.1 christos shrl $16, len_r /* len = this.val */
947 1.1 christos
948 1.1 christos testb $16, %al
949 1.1 christos jz .L_test_for_second_level_length_mmx /* if ((op & 16) == 0) 8% */
950 1.1 christos andl $15, %eax /* op &= 15 */
951 1.1 christos jz .L_decode_distance_mmx /* if (!op) */
952 1.1 christos
953 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */
954 1.1 christos movd %eax, used_mm
955 1.1 christos movd hold_mm, %ecx
956 1.1 christos subl %eax, bitslong_r
957 1.1 christos andl .L_mask(,%eax,4), %ecx
958 1.1 christos addl %ecx, len_r /* len += hold & mask[op] */
959 1.1 christos
960 1.1 christos .L_decode_distance_mmx:
961 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */
962 1.1 christos
963 1.1 christos cmpl $32, bitslong_r
964 1.1 christos ja .L_get_dist_code_mmx /* if (32 < bits) */
965 1.1 christos
966 1.1 christos movd bitslong_r, tmp_mm
967 1.1 christos movd (in_r), %mm7
968 1.1 christos addl $4, in_r
969 1.1 christos psllq tmp_mm, %mm7
970 1.1 christos addl $32, bitslong_r
971 1.1 christos por %mm7, hold_mm /* hold_mm |= *((uint *)in)++ << bits */
972 1.1 christos
973 1.1 christos .L_get_dist_code_mmx:
974 1.1 christos movl dcode(%esp), %ebx /* ebx = dcode */
975 1.1 christos pand hold_mm, dmask_mm
976 1.1 christos movd dmask_mm, %eax
977 1.1 christos movq dmask2_mm, dmask_mm
978 1.1 christos movl (%ebx,%eax,4), %eax /* eax = dcode[hold & lmask] */
979 1.1 christos
980 1.1 christos .L_dodist_mmx:
981 1.1 christos #define dist_r %ebx
982 1.1 christos movzbl %ah, %ecx /* ecx = this.bits */
983 1.1 christos movl %eax, dist_r
984 1.1 christos shrl $16, dist_r /* dist = this.val */
985 1.1 christos subl %ecx, bitslong_r /* bits -= this.bits */
986 1.1 christos movd %ecx, used_mm
987 1.1 christos
988 1.1 christos testb $16, %al /* if ((op & 16) == 0) */
989 1.1 christos jz .L_test_for_second_level_dist_mmx
990 1.1 christos andl $15, %eax /* op &= 15 */
991 1.1 christos jz .L_check_dist_one_mmx
992 1.1 christos
993 1.1 christos .L_add_bits_to_dist_mmx:
994 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */
995 1.1 christos movd %eax, used_mm /* save bit length of current op */
996 1.1 christos movd hold_mm, %ecx /* get the next bits on input stream */
997 1.1 christos subl %eax, bitslong_r /* bits -= op bits */
998 1.1 christos andl .L_mask(,%eax,4), %ecx /* ecx = hold & mask[op] */
999 1.1 christos addl %ecx, dist_r /* dist += hold & mask[op] */
1000 1.1 christos
1001 1.1 christos .L_check_window_mmx:
1002 1.1 christos movl in_r, in(%esp) /* save in so from can use it's reg */
1003 1.1 christos movl out_r, %eax
1004 1.1 christos subl beg(%esp), %eax /* nbytes = out - beg */
1005 1.1 christos
1006 1.1 christos cmpl dist_r, %eax
1007 1.1 christos jb .L_clip_window_mmx /* if (dist > nbytes) 4.2% */
1008 1.1 christos
1009 1.1 christos movl len_r, %ecx
1010 1.1 christos movl out_r, from_r
1011 1.1 christos subl dist_r, from_r /* from = out - dist */
1012 1.1 christos
1013 1.1 christos subl $3, %ecx
1014 1.1 christos movb (from_r), %al
1015 1.1 christos movb %al, (out_r)
1016 1.1 christos movb 1(from_r), %al
1017 1.1 christos movb 2(from_r), %dl
1018 1.1 christos addl $3, from_r
1019 1.1 christos movb %al, 1(out_r)
1020 1.1 christos movb %dl, 2(out_r)
1021 1.1 christos addl $3, out_r
1022 1.1 christos rep movsb
1023 1.1 christos
1024 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */
1025 1.1 christos movl lcode(%esp), %ebx /* move lcode back to %ebx, toss dist */
1026 1.1 christos jmp .L_while_test_mmx
1027 1.1 christos
1028 1.1 christos .align 16,0x90
1029 1.1 christos .L_check_dist_one_mmx:
1030 1.1 christos cmpl $1, dist_r
1031 1.1 christos jne .L_check_window_mmx
1032 1.1 christos cmpl out_r, beg(%esp)
1033 1.1 christos je .L_check_window_mmx
1034 1.1 christos
1035 1.1 christos decl out_r
1036 1.1 christos movl len_r, %ecx
1037 1.1 christos movb (out_r), %al
1038 1.1 christos subl $3, %ecx
1039 1.1 christos
1040 1.1 christos movb %al, 1(out_r)
1041 1.1 christos movb %al, 2(out_r)
1042 1.1 christos movb %al, 3(out_r)
1043 1.1 christos addl $4, out_r
1044 1.1 christos rep stosb
1045 1.1 christos
1046 1.1 christos movl lcode(%esp), %ebx /* move lcode back to %ebx, toss dist */
1047 1.1 christos jmp .L_while_test_mmx
1048 1.1 christos
1049 1.1 christos .align 16,0x90
1050 1.1 christos .L_test_for_second_level_length_mmx:
1051 1.1 christos testb $64, %al
1052 1.1 christos jnz .L_test_for_end_of_block /* if ((op & 64) != 0) */
1053 1.1 christos
1054 1.1 christos andl $15, %eax
1055 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */
1056 1.1 christos movd hold_mm, %ecx
1057 1.1 christos andl .L_mask(,%eax,4), %ecx
1058 1.1 christos addl len_r, %ecx
1059 1.1 christos movl (%ebx,%ecx,4), %eax /* eax = lcode[hold & lmask] */
1060 1.1 christos jmp .L_dolen_mmx
1061 1.1 christos
1062 1.1 christos .align 16,0x90
1063 1.1 christos .L_test_for_second_level_dist_mmx:
1064 1.1 christos testb $64, %al
1065 1.1 christos jnz .L_invalid_distance_code /* if ((op & 64) != 0) */
1066 1.1 christos
1067 1.1 christos andl $15, %eax
1068 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */
1069 1.1 christos movd hold_mm, %ecx
1070 1.1 christos andl .L_mask(,%eax,4), %ecx
1071 1.1 christos movl dcode(%esp), %eax /* ecx = dcode */
1072 1.1 christos addl dist_r, %ecx
1073 1.1 christos movl (%eax,%ecx,4), %eax /* eax = lcode[hold & lmask] */
1074 1.1 christos jmp .L_dodist_mmx
1075 1.1 christos
1076 1.1 christos .align 16,0x90
1077 1.1 christos .L_clip_window_mmx:
1078 1.1 christos #define nbytes_r %ecx
1079 1.1 christos movl %eax, nbytes_r
1080 1.1 christos movl wsize(%esp), %eax /* prepare for dist compare */
1081 1.1 christos negl nbytes_r /* nbytes = -nbytes */
1082 1.1 christos movl window(%esp), from_r /* from = window */
1083 1.1 christos
1084 1.1 christos cmpl dist_r, %eax
1085 1.1 christos jb .L_invalid_distance_too_far /* if (dist > wsize) */
1086 1.1 christos
1087 1.1 christos addl dist_r, nbytes_r /* nbytes = dist - nbytes */
1088 1.1 christos cmpl $0, write(%esp)
1089 1.1 christos jne .L_wrap_around_window_mmx /* if (write != 0) */
1090 1.1 christos
1091 1.1 christos subl nbytes_r, %eax
1092 1.1 christos addl %eax, from_r /* from += wsize - nbytes */
1093 1.1 christos
1094 1.1 christos cmpl nbytes_r, len_r
1095 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */
1096 1.1 christos
1097 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
1098 1.1 christos rep movsb
1099 1.1 christos movl out_r, from_r
1100 1.1 christos subl dist_r, from_r /* from = out - dist */
1101 1.1 christos jmp .L_do_copy1_mmx
1102 1.1 christos
1103 1.1 christos cmpl nbytes_r, len_r
1104 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */
1105 1.1 christos
1106 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
1107 1.1 christos rep movsb
1108 1.1 christos movl out_r, from_r
1109 1.1 christos subl dist_r, from_r /* from = out - dist */
1110 1.1 christos jmp .L_do_copy1_mmx
1111 1.1 christos
1112 1.1 christos .L_wrap_around_window_mmx:
1113 1.1 christos #define write_r %eax
1114 1.1 christos movl write(%esp), write_r
1115 1.1 christos cmpl write_r, nbytes_r
1116 1.1 christos jbe .L_contiguous_in_window_mmx /* if (write >= nbytes) */
1117 1.1 christos
1118 1.1 christos addl wsize(%esp), from_r
1119 1.1 christos addl write_r, from_r
1120 1.1 christos subl nbytes_r, from_r /* from += wsize + write - nbytes */
1121 1.1 christos subl write_r, nbytes_r /* nbytes -= write */
1122 1.1 christos #undef write_r
1123 1.1 christos
1124 1.1 christos cmpl nbytes_r, len_r
1125 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */
1126 1.1 christos
1127 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
1128 1.1 christos rep movsb
1129 1.1 christos movl window(%esp), from_r /* from = window */
1130 1.1 christos movl write(%esp), nbytes_r /* nbytes = write */
1131 1.1 christos cmpl nbytes_r, len_r
1132 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */
1133 1.1 christos
1134 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
1135 1.1 christos rep movsb
1136 1.1 christos movl out_r, from_r
1137 1.1 christos subl dist_r, from_r /* from = out - dist */
1138 1.1 christos jmp .L_do_copy1_mmx
1139 1.1 christos
1140 1.1 christos .L_contiguous_in_window_mmx:
1141 1.1 christos #define write_r %eax
1142 1.1 christos addl write_r, from_r
1143 1.1 christos subl nbytes_r, from_r /* from += write - nbytes */
1144 1.1 christos #undef write_r
1145 1.1 christos
1146 1.1 christos cmpl nbytes_r, len_r
1147 1.1 christos jbe .L_do_copy1_mmx /* if (nbytes >= len) */
1148 1.1 christos
1149 1.1 christos subl nbytes_r, len_r /* len -= nbytes */
1150 1.1 christos rep movsb
1151 1.1 christos movl out_r, from_r
1152 1.1 christos subl dist_r, from_r /* from = out - dist */
1153 1.1 christos
1154 1.1 christos .L_do_copy1_mmx:
1155 1.1 christos #undef nbytes_r
1156 1.1 christos #define in_r %esi
1157 1.1 christos movl len_r, %ecx
1158 1.1 christos rep movsb
1159 1.1 christos
1160 1.1 christos movl in(%esp), in_r /* move in back to %esi, toss from */
1161 1.1 christos movl lcode(%esp), %ebx /* move lcode back to %ebx, toss dist */
1162 1.1 christos jmp .L_while_test_mmx
1163 1.1 christos
1164 1.1 christos #undef hold_r
1165 1.1 christos #undef bitslong_r
1166 1.1 christos
1167 1.1 christos #endif /* USE_MMX || RUN_TIME_MMX */
1168 1.1 christos
1169 1.1 christos
1170 1.1 christos /*** USE_MMX, NO_MMX, and RUNTIME_MMX from here on ***/
1171 1.1 christos
1172 1.1 christos .L_invalid_distance_code:
1173 1.1 christos /* else {
1174 1.1 christos * strm->msg = "invalid distance code";
1175 1.1 christos * state->mode = BAD;
1176 1.1 christos * }
1177 1.1 christos */
1178 1.1 christos movl $.L_invalid_distance_code_msg, %ecx
1179 1.1 christos movl $INFLATE_MODE_BAD, %edx
1180 1.1 christos jmp .L_update_stream_state
1181 1.1 christos
1182 1.1 christos .L_test_for_end_of_block:
1183 1.1 christos /* else if (op & 32) {
1184 1.1 christos * state->mode = TYPE;
1185 1.1 christos * break;
1186 1.1 christos * }
1187 1.1 christos */
1188 1.1 christos testb $32, %al
1189 1.1 christos jz .L_invalid_literal_length_code /* if ((op & 32) == 0) */
1190 1.1 christos
1191 1.1 christos movl $0, %ecx
1192 1.1 christos movl $INFLATE_MODE_TYPE, %edx
1193 1.1 christos jmp .L_update_stream_state
1194 1.1 christos
1195 1.1 christos .L_invalid_literal_length_code:
1196 1.1 christos /* else {
1197 1.1 christos * strm->msg = "invalid literal/length code";
1198 1.1 christos * state->mode = BAD;
1199 1.1 christos * }
1200 1.1 christos */
1201 1.1 christos movl $.L_invalid_literal_length_code_msg, %ecx
1202 1.1 christos movl $INFLATE_MODE_BAD, %edx
1203 1.1 christos jmp .L_update_stream_state
1204 1.1 christos
1205 1.1 christos .L_invalid_distance_too_far:
1206 1.1 christos /* strm->msg = "invalid distance too far back";
1207 1.1 christos * state->mode = BAD;
1208 1.1 christos */
1209 1.1 christos movl in(%esp), in_r /* from_r has in's reg, put in back */
1210 1.1 christos movl $.L_invalid_distance_too_far_msg, %ecx
1211 1.1 christos movl $INFLATE_MODE_BAD, %edx
1212 1.1 christos jmp .L_update_stream_state
1213 1.1 christos
1214 1.1 christos .L_update_stream_state:
1215 1.1 christos /* set strm->msg = %ecx, strm->state->mode = %edx */
1216 1.1 christos movl strm_sp(%esp), %eax
1217 1.1 christos testl %ecx, %ecx /* if (msg != NULL) */
1218 1.1 christos jz .L_skip_msg
1219 1.1 christos movl %ecx, msg_strm(%eax) /* strm->msg = msg */
1220 1.1 christos .L_skip_msg:
1221 1.1 christos movl state_strm(%eax), %eax /* state = strm->state */
1222 1.1 christos movl %edx, mode_state(%eax) /* state->mode = edx (BAD | TYPE) */
1223 1.1 christos jmp .L_break_loop
1224 1.1 christos
1225 1.1 christos .align 32,0x90
1226 1.1 christos .L_break_loop:
1227 1.1 christos
1228 1.1 christos /*
1229 1.1 christos * Regs:
1230 1.1 christos *
1231 1.1 christos * bits = %ebp when mmx, and in %ebx when non-mmx
1232 1.1 christos * hold = %hold_mm when mmx, and in %ebp when non-mmx
1233 1.1 christos * in = %esi
1234 1.1 christos * out = %edi
1235 1.1 christos */
1236 1.1 christos
1237 1.1 christos #if defined( USE_MMX ) || defined( RUN_TIME_MMX )
1238 1.1 christos
1239 1.1 christos #if defined( RUN_TIME_MMX )
1240 1.1 christos
1241 1.1 christos cmpl $DO_USE_MMX, inflate_fast_use_mmx
1242 1.1 christos jne .L_update_next_in
1243 1.1 christos
1244 1.1 christos #endif /* RUN_TIME_MMX */
1245 1.1 christos
1246 1.1 christos movl %ebp, %ebx
1247 1.1 christos
1248 1.1 christos .L_update_next_in:
1249 1.1 christos
1250 1.1 christos #endif
1251 1.1 christos
1252 1.1 christos #define strm_r %eax
1253 1.1 christos #define state_r %edx
1254 1.1 christos
1255 1.1 christos /* len = bits >> 3;
1256 1.1 christos * in -= len;
1257 1.1 christos * bits -= len << 3;
1258 1.1 christos * hold &= (1U << bits) - 1;
1259 1.1 christos * state->hold = hold;
1260 1.1 christos * state->bits = bits;
1261 1.1 christos * strm->next_in = in;
1262 1.1 christos * strm->next_out = out;
1263 1.1 christos */
1264 1.1 christos movl strm_sp(%esp), strm_r
1265 1.1 christos movl %ebx, %ecx
1266 1.1 christos movl state_strm(strm_r), state_r
1267 1.1 christos shrl $3, %ecx
1268 1.1 christos subl %ecx, in_r
1269 1.1 christos shll $3, %ecx
1270 1.1 christos subl %ecx, %ebx
1271 1.1 christos movl out_r, next_out_strm(strm_r)
1272 1.1 christos movl %ebx, bits_state(state_r)
1273 1.1 christos movl %ebx, %ecx
1274 1.1 christos
1275 1.1 christos leal buf(%esp), %ebx
1276 1.1 christos cmpl %ebx, last(%esp)
1277 1.1 christos jne .L_buf_not_used /* if buf != last */
1278 1.1 christos
1279 1.1 christos subl %ebx, in_r /* in -= buf */
1280 1.1 christos movl next_in_strm(strm_r), %ebx
1281 1.1 christos movl %ebx, last(%esp) /* last = strm->next_in */
1282 1.1 christos addl %ebx, in_r /* in += strm->next_in */
1283 1.1 christos movl avail_in_strm(strm_r), %ebx
1284 1.1 christos subl $11, %ebx
1285 1.1 christos addl %ebx, last(%esp) /* last = &strm->next_in[ avail_in - 11 ] */
1286 1.1 christos
1287 1.1 christos .L_buf_not_used:
1288 1.1 christos movl in_r, next_in_strm(strm_r)
1289 1.1 christos
1290 1.1 christos movl $1, %ebx
1291 1.1 christos shll %cl, %ebx
1292 1.1 christos decl %ebx
1293 1.1 christos
1294 1.1 christos #if defined( USE_MMX ) || defined( RUN_TIME_MMX )
1295 1.1 christos
1296 1.1 christos #if defined( RUN_TIME_MMX )
1297 1.1 christos
1298 1.1 christos cmpl $DO_USE_MMX, inflate_fast_use_mmx
1299 1.1 christos jne .L_update_hold
1300 1.1 christos
1301 1.1 christos #endif /* RUN_TIME_MMX */
1302 1.1 christos
1303 1.1 christos psrlq used_mm, hold_mm /* hold_mm >>= last bit length */
1304 1.1 christos movd hold_mm, %ebp
1305 1.1 christos
1306 1.1 christos emms
1307 1.1 christos
1308 1.1 christos .L_update_hold:
1309 1.1 christos
1310 1.1 christos #endif /* USE_MMX || RUN_TIME_MMX */
1311 1.1 christos
1312 1.1 christos andl %ebx, %ebp
1313 1.1 christos movl %ebp, hold_state(state_r)
1314 1.1 christos
1315 1.1 christos #define last_r %ebx
1316 1.1 christos
1317 1.1 christos /* strm->avail_in = in < last ? 11 + (last - in) : 11 - (in - last) */
1318 1.1 christos movl last(%esp), last_r
1319 1.1 christos cmpl in_r, last_r
1320 1.1 christos jbe .L_last_is_smaller /* if (in >= last) */
1321 1.1 christos
1322 1.1 christos subl in_r, last_r /* last -= in */
1323 1.1 christos addl $11, last_r /* last += 11 */
1324 1.1 christos movl last_r, avail_in_strm(strm_r)
1325 1.1 christos jmp .L_fixup_out
1326 1.1 christos .L_last_is_smaller:
1327 1.1 christos subl last_r, in_r /* in -= last */
1328 1.1 christos negl in_r /* in = -in */
1329 1.1 christos addl $11, in_r /* in += 11 */
1330 1.1 christos movl in_r, avail_in_strm(strm_r)
1331 1.1 christos
1332 1.1 christos #undef last_r
1333 1.1 christos #define end_r %ebx
1334 1.1 christos
1335 1.1 christos .L_fixup_out:
1336 1.1 christos /* strm->avail_out = out < end ? 257 + (end - out) : 257 - (out - end)*/
1337 1.1 christos movl end(%esp), end_r
1338 1.1 christos cmpl out_r, end_r
1339 1.1 christos jbe .L_end_is_smaller /* if (out >= end) */
1340 1.1 christos
1341 1.1 christos subl out_r, end_r /* end -= out */
1342 1.1 christos addl $257, end_r /* end += 257 */
1343 1.1 christos movl end_r, avail_out_strm(strm_r)
1344 1.1 christos jmp .L_done
1345 1.1 christos .L_end_is_smaller:
1346 1.1 christos subl end_r, out_r /* out -= end */
1347 1.1 christos negl out_r /* out = -out */
1348 1.1 christos addl $257, out_r /* out += 257 */
1349 1.1 christos movl out_r, avail_out_strm(strm_r)
1350 1.1 christos
1351 1.1 christos #undef end_r
1352 1.1 christos #undef strm_r
1353 1.1 christos #undef state_r
1354 1.1 christos
1355 1.1 christos .L_done:
1356 1.1 christos addl $local_var_size, %esp
1357 1.1 christos popf
1358 1.1 christos popl %ebx
1359 1.1 christos popl %ebp
1360 1.1 christos popl %esi
1361 1.1 christos popl %edi
1362 1.1 christos ret
1363 1.1 christos
1364 1.1 christos #if defined( GAS_ELF )
1365 1.1 christos /* elf info */
1366 1.1 christos .type inflate_fast,@function
1367 1.1 christos .size inflate_fast,.-inflate_fast
1368 1.1 christos #endif
1369