strlen.S revision 1.5
11.1Schristos/* 21.1Schristos * Written by J.T. Conklin <jtc@acorntoolworks.com> 31.1Schristos * Public domain. 41.1Schristos */ 51.1Schristos 61.1Schristos#include <machine/asm.h> 71.1Schristos 81.1Schristos#if defined(LIBC_SCCS) 91.5Sandvar RCSID("$NetBSD: strlen.S,v 1.5 2024/03/30 22:03:39 andvar Exp $") 101.1Schristos#endif 111.1Schristos 121.1SchristosENTRY(strlen) 131.1Schristos movl 4(%esp),%eax 141.1Schristos 151.1Schristos.Lalign: 161.1Schristos /* Consider unrolling loop? */ 171.1Schristos testb $3,%al 181.1Schristos je .Lword_aligned 191.1Schristos cmpb $0,(%eax) 201.1Schristos je .Ldone 211.1Schristos incl %eax 221.1Schristos jmp .Lalign 231.1Schristos 241.1Schristos /* 251.1Schristos * There are many well known branch-free sequences which are used 261.1Schristos * for determining whether a zero-byte is contained within a word. 271.4Sandvar * These sequences are generally much more efficient than loading 281.1Schristos * and comparing each byte individually. 291.1Schristos * 301.1Schristos * The expression [1,2]: 311.1Schristos * 321.1Schristos * (1) ~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | (x | 0x7f7f7f7f)) 331.1Schristos * 341.1Schristos * evaluates to a non-zero value if any of the bytes in the 351.1Schristos * original word is zero. 361.1Schristos * 371.1Schristos * It also has the useful property that bytes in the result word 381.1Schristos * that correspond to non-zero bytes in the original word have 391.1Schristos * the value 0x00, while bytes corresponding to zero bytes have 401.1Schristos * the value 0x80. This allows calculation of the first (and 411.1Schristos * last) occurrence of a zero byte within the word (useful for C's 421.1Schristos * str* primitives) by counting the number of leading (or 431.1Schristos * trailing) zeros and dividing the result by 8. On machines 441.1Schristos * without (or with slow) clz() / ctz() instructions, testing 451.1Schristos * each byte in the result word for zero is necessary. 461.1Schristos * 471.1Schristos * This typically takes 4 instructions (5 on machines without 481.1Schristos * "not-or") not including those needed to load the constant. 491.1Schristos * 501.1Schristos * 511.1Schristos * The expression: 521.1Schristos * 531.1Schristos * (2) ((x - 0x01010101) & ~x & 0x80808080) 541.1Schristos * 551.1Schristos * evaluates to a non-zero value if any of the bytes in the 561.1Schristos * original word is zero. 571.1Schristos * 581.1Schristos * On little endian machines, the first byte in the result word 591.1Schristos * that corresponds to a zero byte in the original byte is 0x80, 601.1Schristos * so clz() can be used as above. On big endian machines, and 611.1Schristos * little endian machines without (or with a slow) clz() insn, 621.1Schristos * testing each byte in the original for zero is necessary. 631.1Schristos * 641.1Schristos * This typically takes 3 instructions (4 on machines without 651.1Schristos * "and with complement") not including those needed to load 661.1Schristos * constants. 671.1Schristos * 681.1Schristos * 691.1Schristos * The expression: 701.1Schristos * 711.1Schristos * (3) ((x - 0x01010101) & 0x80808080) 721.1Schristos * 731.1Schristos * always evaluates to a non-zero value if any of the bytes in 741.1Schristos * the original word is zero. However, in rare cases, it also 751.1Schristos * evaluates to a non-zero value when none of the bytes in the 761.1Schristos * original word is zero. 771.1Schristos * 781.1Schristos * To account for possible false positives, each byte of the 791.1Schristos * original word must be checked when the expression evaluates to 801.1Schristos * a non-zero value. However, because it is simpler than those 811.1Schristos * presented above, code that uses it will be faster as long as 821.1Schristos * the rate of false positives is low. 831.1Schristos * 841.3Sandvar * This is likely, because the false positive can only occur 851.1Schristos * if the most siginificant bit of a byte within the word is set. 861.1Schristos * The expression will never fail for typical 7-bit ASCII strings. 871.1Schristos * 881.1Schristos * This typically takes 2 instructions not including those needed 891.1Schristos * to load constants. 901.1Schristos * 911.1Schristos * 921.5Sandvar * [1] Henry S. Warren Jr., "Hacker's Delight", Addison-Wesley 2003 931.1Schristos * 941.1Schristos * [2] International Business Machines, "The PowerPC Compiler Writer's 951.1Schristos * Guide", Warthman Associates, 1996 961.1Schristos */ 971.1Schristos 981.1Schristos _ALIGN_TEXT 991.1Schristos.Lword_aligned: 1001.1Schristos.Lloop: 1011.1Schristos movl (%eax),%ecx 1021.1Schristos addl $4,%eax 1031.1Schristos leal -0x01010101(%ecx),%edx 1041.1Schristos testl $0x80808080,%edx 1051.1Schristos je .Lloop 1061.1Schristos 1071.1Schristos /* 1081.1Schristos * In rare cases, the above loop may exit prematurely. We must 1091.1Schristos * return to the loop if none of the bytes in the word equal 0. 1101.1Schristos */ 1111.1Schristos 1121.1Schristos /* 1131.1Schristos * The optimal code for determining whether each byte is zero 1141.1Schristos * differs by processor. This space-optimized code should be 1151.1Schristos * acceptable on all, especially since we don't expect it to 1161.1Schristos * be run frequently, 1171.1Schristos */ 1181.1Schristos 1191.1Schristos testb %cl,%cl /* 1st byte == 0? */ 1201.1Schristos jne 1f 1211.1Schristos subl $4,%eax 1221.1Schristos jmp .Ldone 1231.1Schristos 1241.1Schristos1: testb %ch,%ch /* 2nd byte == 0? */ 1251.1Schristos jne 1f 1261.1Schristos subl $3,%eax 1271.1Schristos jmp .Ldone 1281.1Schristos 1291.1Schristos1: shrl $16,%ecx 1301.1Schristos testb %cl,%cl /* 3rd byte == 0? */ 1311.1Schristos jne 1f 1321.1Schristos subl $2,%eax 1331.1Schristos jmp .Ldone 1341.1Schristos 1351.1Schristos1: testb %ch,%ch /* 4th byte == 0? */ 1361.1Schristos jne .Lloop 1371.1Schristos decl %eax 1381.1Schristos 1391.1Schristos.Ldone: 1401.1Schristos subl 4(%esp),%eax 1411.1Schristos ret 1421.2SjakllschEND(strlen) 143