1 2 <html> 3 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 6 <title>SLJIT tutorial</title> 7 8 <style type="text/css"> 9 body { 10 background-color: #707070; 11 color: #000000; 12 font-family: "garamond" 13 } 14 td.main { 15 background-color: #ffffff; 16 color: #000000; 17 font-family: "garamond" 18 } 19 </style> 20 </head> 21 22 <body> 23 24 <center> 25 <table width="760" cellspacing=0 cellpadding=0> 26 <tr height=20><td width=20 class="main"></td><td width=720 class="main"></td><td width=20 class="main"></td></tr> 27 <tr><td width=20 class="main"></td><td width=720 class="main"> 28 29 <center> 30 <a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=248047&type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a> 31 </center> 32 <h1><center>SLJIT tutorial</center></h1> 33 34 <h2>Before started</h2> 35 36 <a href="">Download the tutorial sources</a><br> 37 <br> 38 SLJIT is a light-weight, platform independent JIT compiler, it's easy to 39 embed to your own project, as a result of its 'stack-less', SLJIT have 40 some limit to register usage.<br> 41 <br> 42 Here is some other JIT compiler I digged these days, place here if you have interest:<br> 43 44 <ul> 45 <b>Libjit/liblighning:</b> - the backend of GNU.net<br> 46 <b>Libgccjit:</b> - introduced in GCC5.0, its different from other JIT lib, this 47 one seems like constructing a C code, it use the backend of GCC.<br> 48 <b>AsmJIT:</b> - branch from the famous V8 project (JavaScript engine in Chrome), 49 support only X86/X86_64.<br> 50 <b>DynASM:</b> - used in LuaJIT.<br> 51 </ul> 52 53 <br> 54 AsmJIT and DynASM work in the instruction level, look like coding with ASM language, 55 SLJIT look like ASM also, but it hide the detail of the specific CPU, make it more 56 common, and become portable, libjit work on higher layer, libgccjit as I mention, 57 really you are constructing the C code.<br> 58 59 <h2>First program</h2> 60 61 Usage of SLJIT: 62 <ul> 63 1. #include "sljitLir.h" in the head of your C/C++ program<br> 64 2. Compile with sljit_src/sljitLir.c<br> 65 </ul> 66 67 ALL example can be compile like this: 68 <ul> 69 gcc -Wall -Ipath/to/sljit_src -DSLJIT_CONFIG_AUTO=1 \<br> 70 <ul><b>xxx.c</b> path/to/sljit_src/sljitLir.c -o program</ul> 71 </ul> 72 73 OK, let's take a look at the first program, this program we create a function that 74 return the sum of 3 arguments.<br> 75 <br> 76 <div style='font-family:Courier New;font-size:11px'> 77 <ul> 78 #include "sljitLir.h"<br> 79 <br> 80 #include <stdio.h><br> 81 #include <stdlib.h><br> 82 <br> 83 typedef sljit_sw (*func3_t)(sljit_sw a, sljit_sw b, sljit_sw c);<br> 84 <br> 85 static int add3(sljit_sw a, sljit_sw b, sljit_sw c)<br> 86 {<br> 87 <ul> 88 void *code;<br> 89 sljit_sw len;<br> 90 func3_t func;<br> 91 <br> 92 /* Create a SLJIT compiler */<br> 93 struct sljit_compiler *C = sljit_create_compiler();<br> 94 <br> 95 /* Start a context(function entry), have 3 arguments, discuss later */<br> 96 sljit_emit_enter(C, 0, 3, 1, 3, 0, 0, 0);<br> 97 <br> 98 /* The first arguments of function is register SLJIT_S0, 2nd, SLJIT_S1, etc. */<br> 99 /* R0 = first */<br> 100 sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_S0, 0);<br> 101 <br> 102 /* R0 = R0 + second */<br> 103 sljit_emit_op2(C, SLJIT_ADD, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_S1, 0);<br> 104 <br> 105 /* R0 = R0 + third */<br> 106 sljit_emit_op2(C, SLJIT_ADD, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_S2, 0);<br> 107 <br> 108 /* This statement mov R0 to RETURN REG and return */<br> 109 /* in fact, R0 is RETURN REG itself */<br> 110 sljit_emit_return(C, SLJIT_MOV, SLJIT_R0, 0);<br> 111 <br> 112 /* Generate machine code */<br> 113 code = sljit_generate_code(C);<br> 114 len = sljit_get_generated_code_size(C);<br> 115 <br> 116 /* Execute code */<br> 117 func = (func3_t)code;<br> 118 printf("func return %ld\n", func(a, b, c));<br> 119 <br> 120 /* dump_code(code, len); */<br> 121 <br> 122 /* Clean up */<br> 123 sljit_free_compiler(C);<br> 124 sljit_free_code(code);<br> 125 return 0;<br> 126 </ul> 127 }<br> 128 <br> 129 int main()<br> 130 {<br> 131 <ul> 132 return add3(4, 5, 6);<br> 133 </ul> 134 }<br> 135 </ul> 136 </div> 137 138 <br> 139 The function sljit_emit_enter create a context, save some registers to the stack, 140 and create a call-frame, sljit_emit_return restore the saved-register and clean-up 141 the frame. SLJIT is design to embed into other application, the code it generated 142 has to follow some basic rule.<br> 143 <br> 144 The standard called Application Binary Interface, or ABI for short, here is a 145 document for X86_64 CPU (<a href="http://www.x86-64.org/documentation/abi.pdf">ABI.pdf</a>), 146 almost all Linux/Unix follow this standard. MS windows has its own, read this for more: 147 <a href="http://en.wikipedia.org/wiki/X86_calling_conventions">X86_calling_conventions</a><br> 148 <br> 149 When reading the doc of sljit_emit_emter, the parameters 'saveds' and 'scratchs' make 150 me confused. The fact is, the registers in CPU has different functions in the ABI spec, 151 some of them used to pass arguments, some of them are 'callee-saved', some of them are 152 'temporary used', take X86_64 for example, RAX, R10, R11 are temporary used, that means, 153 they may be changed after a call instruction. And RBX, R12-R15 are callee-saved, those 154 will remain the same values after the call. The rule is, every function should save 155 those registers before using it.<br> 156 <br> 157 Fortunately, SLJIT have done the most for us, SLJIT_S[0-9] represent those 'safe' 158 registers, SLJIT_R[0-9] however, only for 'temporary used'.<br> 159 <br> 160 When a function start, SLJIT move the function arguments to S0, S1, S2 register, it 161 means function arguments are always 'safe' in the context, the limit of using stack for 162 storing arguments make SLJIT support only 3 arguments max.<br> 163 <br> 164 Sljit_emit_opX is easy to understand, in SLJIT a data value is represented by 2 165 parameters, it can be a register, an In-memory data, or an immediate number.<br> 166 <br> 167 168 <table align="center" cellspacing="0"> 169 <tr><td>First parameter</td> <td>Second parameter</td> <td>Meaning</td></tr> 170 <tr><td>SLJIT_R*, SLJIT_S*</td> <td>0</td> <td>Temp/saved registers</td></tr> 171 <tr><td>SLJIT_IMM</td> <td>Number</td> <td>Immediate number</td></tr> 172 <tr><td>SLJIT_MEM</td> <td>Address</td> <td>In-mem data with Absolute address</td></tr> 173 <tr><td>SLJIT_MEM1(r)</td> <td>Offset</td> <td>In-mem data in [R + offset]</td></tr> 174 <tr><td>SLJIT_MEM2(r1, r2)</td> <td>Shift(size)</td> <td>In-mem array, R1 as base address, R2 as index, <br> 175 Shift as size(0 for bytes, 1 for shorts, 2 for <br> 176 4bytes, 3 for 8bytes)</td></tr> 177 </table> 178 179 <h2>Branch</h2> 180 <div style='font-family:Courier New;font-size:11px'> 181 <ul> 182 #include "sljitLir.h"<br> 183 <br> 184 #include <stdio.h><br> 185 #include <stdlib.h><br> 186 <br> 187 typedef sljit_sw (*func3_t)(sljit_sw a, sljit_sw b, sljit_sw c);<br> 188 <br> 189 /*<br> 190 This example, we generate a function like this:<br> 191 <br> 192 sljit_sw func(sljit_sw a, sljit_sw b, sljit_sw c)<br> 193 {<br> 194 <ul> 195 if ((a & 1) == 0)<br> 196 <ul> 197 return c;<br> 198 </ul> 199 return b;<br> 200 </ul> 201 }<br> 202 <br> 203 */<br> 204 static int branch(sljit_sw a, sljit_sw b, sljit_sw c)<br> 205 {<br> 206 <ul> 207 void *code;<br> 208 sljit_uw len;<br> 209 func3_t func;<br> 210 <br> 211 struct sljit_jump *ret_c;<br> 212 struct sljit_jump *out;<br> 213 <br> 214 /* Create a SLJIT compiler */<br> 215 struct sljit_compiler *C = sljit_create_compiler();<br> 216 <br> 217 /* 3 arg, 1 temp reg, 3 save reg */<br> 218 sljit_emit_enter(C, 0, 3, 1, 3, 0, 0, 0);<br> 219 <br> 220 /* R0 = a & 1, S0 is argument a */<br> 221 sljit_emit_op2(C, SLJIT_AND, SLJIT_R0, 0, SLJIT_S0, 0, SLJIT_IMM, 1);<br> 222 <br> 223 /* if R0 == 0 then jump to ret_c, where is ret_c? we assign it later */<br> 224 ret_c = sljit_emit_cmp(C, SLJIT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0);<br> 225 <br> 226 /* R0 = b, S1 is argument b */<br> 227 sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S1, 0);<br> 228 <br> 229 /* jump to out */<br> 230 out = sljit_emit_jump(C, SLJIT_JUMP);<br> 231 <br> 232 /* here is the 'ret_c' should jump, we emit a label and set it to ret_c */<br> 233 sljit_set_label(ret_c, sljit_emit_label(C));<br> 234 <br> 235 /* R0 = c, S2 is argument c */<br> 236 sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S2, 0);<br> 237 <br> 238 /* here is the 'out' should jump */<br> 239 sljit_set_label(out, sljit_emit_label(C));<br> 240 <br> 241 /* end of function */<br> 242 sljit_emit_return(C, SLJIT_MOV, SLJIT_RETURN_REG, 0);<br> 243 <br> 244 /* Generate machine code */<br> 245 code = sljit_generate_code(C);<br> 246 len = sljit_get_generated_code_size(C);<br> 247 <br> 248 /* Execute code */<br> 249 func = (func3_t)code;<br> 250 printf("func return %ld\n", func(a, b, c));<br> 251 <br> 252 /* dump_code(code, len); */<br> 253 <br> 254 /* Clean up */<br> 255 sljit_free_compiler(C);<br> 256 sljit_free_code(code);<br> 257 return 0;<br> 258 </ul> 259 }<br> 260 <br> 261 int main()<br> 262 {<br> 263 <ul> 264 return branch(4, 5, 6);<br> 265 </ul> 266 }<br> 267 </ul> 268 </div> 269 270 The key to implement branch is 'struct sljit_jump' and 'struct sljit_label', 271 the 'jump' contain a jump instruction, it does not know where to jump unless 272 you set a label to it, the 'label' is a code address just like label in ASM 273 language.<br> 274 <br> 275 sljit_emit_cmp/sljit_emit_jump generate a conditional/unconditional jump, 276 take the statement<br> 277 <ul> 278 ret_c = sljit_emit_cmp(C, SLJIT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0);<br> 279 </ul> 280 For example, it create a jump instruction, the condition is R0 equals 0, and 281 the position of jumping will assign later with the sljit_set_label statement.<br> 282 <br> 283 In this example, it creates a branch like this:<br> 284 <ul> 285 <ul> 286 R0 = a & 1;<br> 287 if R0 == 0 then goto ret_c;<br> 288 R0 = b;<br> 289 goto out;<br> 290 </ul> 291 ret_c:<br> 292 <ul> 293 R0 = c;<br> 294 </ul> 295 out:<br> 296 <ul> 297 return R0;<br> 298 </ul> 299 </ul> 300 <br> 301 This is how high-level-language compiler handle branch.<br> 302 <br> 303 304 <h2>Loop</h2> 305 306 Loop example is similar with Branch. 307 308 <div style='font-family:Courier New;font-size:11px'> 309 <ul> 310 /* 311 This example, we generate a function like this:<br> 312 <br> 313 sljit_sw func(sljit_sw a, sljit_sw b)<br> 314 {<br> 315 <ul> 316 sljit_sw i;<br> 317 sljit_sw ret = 0;<br> 318 for (i = 0; i < a; ++i) {<br> 319 <ul> 320 ret += b;<br> 321 </ul> 322 }<br> 323 return ret;<br> 324 </ul> 325 }<br> 326 */<br> 327 <br> 328 <ul> 329 /* 2 arg, 2 temp reg, 2 saved reg */<br> 330 sljit_emit_enter(C, 0, 2, 2, 2, 0, 0, 0);<br> 331 <br> 332 /* R0 = 0 */<br> 333 sljit_emit_op2(C, SLJIT_XOR, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_R1, 0);<br> 334 /* RET = 0 */<br> 335 sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0);<br> 336 /* loopstart: */<br> 337 loopstart = sljit_emit_label(C);<br> 338 /* R1 >= a --> jump out */<br> 339 out = sljit_emit_cmp(C, SLJIT_GREATER_EQUAL, SLJIT_R1, 0, SLJIT_S0, 0);<br> 340 /* RET += b */<br> 341 sljit_emit_op2(C, SLJIT_ADD, SLJIT_RETURN_REG, 0, SLJIT_RETURN_REG, 0, SLJIT_S1, 0);<br> 342 /* R1 += 1 */<br> 343 sljit_emit_op2(C, SLJIT_ADD, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 1);<br> 344 /* jump loopstart */<br> 345 sljit_set_label(sljit_emit_jump(C, SLJIT_JUMP), loopstart);<br> 346 /* out: */<br> 347 sljit_set_label(out, sljit_emit_label(C));<br> 348 <br> 349 /* return RET */<br> 350 sljit_emit_return(C, SLJIT_MOV, SLJIT_RETURN_REG, 0);<br> 351 </ul> 352 </ul> 353 </div> 354 355 After this example, you are ready to construct any program that contain complex branch 356 and loop.<br> 357 <br> 358 Here is an interesting fact, 'xor reg, reg' is better than 'mov reg, 0', it save 2 bytes 359 in X86 machine.<br> 360 <br> 361 I will give only the key code in the rest of this tutorial, the full source of each 362 chapter can be found in the attachment.<br> 363 364 365 <h2>Call external function</h2> 366 367 It's easy to call an external function in SLJIT, we use sljit_emit_ijump with SLJIT_CALL* 368 operation to do so.<br> 369 <br> 370 SLJIT_CALL[N] is use to call a function with N arguments, SLJIT has only SLJIT_CALL0, 371 CALL1, CALL2, CALL3, which means you can call a function with 3 arguments in max(that 372 disappoint me, no chance to call fwrite in SLJIT), the arguments for the callee function 373 are passed from SLJIT_R0, R1 and R2. Keep in mind to maintain those 'temp registers'.<br> 374 <br> 375 Assume that we have an external function:<br> 376 <ul> 377 sljit_sw print_num(sljit_sw a); 378 </ul> 379 380 JIT code to call print_num(S1): 381 382 <div style='font-family:Courier New;font-size:11px'> 383 <ul> 384 /* R0 = S1; */<br> 385 sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_S1, 0);<br> 386 /* print_num(R0) */<br> 387 sljit_emit_ijump(C, SLJIT_CALL1, SLJIT_IMM, SLJIT_FUNC_OFFSET(print_num));<br> 388 </ul> 389 </div> 390 <br> 391 This code call a imm-data(address of print_num), which is linked properly when the 392 program loaded. There no problem in 1-time compile and execute, but when you planning 393 to save to file and load/execute next time, that address may not correct as you expect, 394 in some platform that support PIC, the address of print_num may relocate to another 395 address in run-time. Check this out: 396 <a href="http://en.wikipedia.org/wiki/Position-independent_code">PIC</a><br> 397 <br> 398 399 <h2>Structure access</h2> 400 401 SLJIT use SLJIT_MEM1 to implement [Reg + offset] memory access.<br> 402 <div style='font-family:Courier New;font-size:11px'> 403 <ul> 404 struct point_st {<br> 405 <ul> 406 sljit_sw x;<br> 407 int y;<br> 408 short z;<br> 409 char d;<br> 410 char e;<br> 411 </ul> 412 };<br> 413 <br> 414 sljit_emit_op1(C, SLJIT_MOV_SI, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_S0),<br> 415 <ul> 416 SLJIT_OFFSETOF(struct point_st, y));<br> 417 </ul> 418 </ul> 419 </div> 420 421 In this case, SLJIT_S0 is the address of the point_st structure, offset of member 'y' 422 is determined in compile time, the important MOV operation always comes with a 423 'signed/size' postfix, like this one _SI means 'signed 32bits integer', the postfix 424 list:<br> 425 <ul> 426 <b>UB</b> = unsigned byte (8 bit)<br> 427 <b>SB</b> = signed byte (8 bit)<br> 428 <b>UH</b> = unsigned half (16 bit)<br> 429 <b>SH</b> = signed half (16 bit)<br> 430 <b>UI</b> = unsigned int (32 bit)<br> 431 <b>SI</b> = signed int (32 bit)<br> 432 <b>P</b> = pointer (sljit_p) size<br> 433 </ul> 434 435 <h2>Array accessing</h2> 436 437 SLJIT use SLJIT_MEM2 to access arrays, like this:<br> 438 439 <div style='font-family:Courier New;font-size:11px'> 440 <ul> 441 sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM2(SLJIT_S0, SLJIT_S2),<br> 442 <ul> 443 SLJIT_WORD_SHIFT); 444 </ul> 445 </ul> 446 </div> 447 448 This statement generates a code like this:<br> 449 <ul> 450 WORD S0[];<br> 451 R0 = S0[S2]<br> 452 </ul> 453 <br> 454 The array S0 is declared to be WORD, which will be sizeof(sljit_sw) in length. 455 Sljit use a 'shift' for length representation: (0 for single byte, 1 for 2 456 bytes, 2 for 4 bytes, 3 for 8bytes)<br> 457 <br> 458 The file array_access.c demonstrate a array-print example, should be easy 459 to understand.<br> 460 461 <h2>Local variables</h2> 462 463 SLJIT provide SLJIT_MEM1(SLJIT_SP) to access the reserved space in 464 sljit_emit_enter's last parameter.<br> 465 In this example we have to pass the address to print_arr, local variable 466 is the only choice.<br> 467 468 <div style='font-family:Courier New;font-size:11px'> 469 <ul> 470 /* reserved space in stack for sljit_sw arr[3] */<br> 471 sljit_emit_enter(C, 0, 3, 2, 3, 0, 0, 3 * sizeof(sljit_sw));<br> 472 /* opt arg R S FR FS local_size */<br> 473 <br> 474 /* arr[0] = S0, SLJIT_SP is the init address of local var */<br> 475 sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 0, SLJIT_S0, 0);<br> 476 /* arr[1] = S1 */<br> 477 sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 1 * sizeof(sljit_sw), SLJIT_S1, 0);<br> 478 /* arr[2] = S2 */<br> 479 sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 2 * sizeof(sljit_sw), SLJIT_S2, 0);<br> 480 <br> 481 /* R0 = arr; in fact SLJIT_SP is the address of arr, but can't do so in SLJIT */<br> 482 sljit_get_local_base(C, SLJIT_R0, 0, 0); /* get the address of local variables */<br> 483 sljit_emit_op1(C, SLJIT_MOV, SLJIT_R1, 0, SLJIT_IMM, 3); /* R1 = 3; */<br> 484 sljit_emit_ijump(C, SLJIT_CALL2, SLJIT_IMM, SLJIT_FUNC_OFFSET(print_arr));<br> 485 sljit_emit_return(C, SLJIT_MOV, SLJIT_R0, 0);<br> 486 </ul> 487 </div> 488 <br> 489 SLJIT_SP can only be used in SLJIT_MEM1(SLJIT_SP). In this case, SP is the 490 address of 'arr', but we cannot assign it to Reg using SLJIT_MOV opr, 491 instead, we use sljit_get_local_base, which load the address and offset of 492 local variable to the target.<br> 493 494 <h2>Brainfuck compiler</h2> 495 496 Ok, the basic usage of SLJIT ends here, with more detail, I suggest reading 497 sljitLir.h directly, having fun hacking the wonder of SLJIT!<br> 498 <br> 499 The brainfuck machine introduction can be found here: 500 <a href="http://en.wikipedia.org/wiki/Brainfuck">Brainfuck</a><br> 501 <br> 502 503 <h2>Extra</h2> 504 505 1. Dump_code function<br> 506 SLJIT didn't provide disassemble functional, this is a simple function to do this(X86 only)<br> 507 <br> 508 509 <div style='font-family:Courier New;font-size:11px'> 510 <ul> 511 static void dump_code(void *code, sljit_uw len)<br> 512 {<br> 513 <ul> 514 FILE *fp = fopen("/tmp/slj_dump", "wb");<br> 515 if (!fp)<br> 516 <ul> 517 return;<br> 518 </ul> 519 fwrite(code, len, 1, fp);<br> 520 fclose(fp);<br> 521 </ul> 522 #if defined(SLJIT_CONFIG_X86_64)<br> 523 <ul> 524 system("objdump -b binary -m l1om -D /tmp/slj_dump");<br> 525 </ul> 526 #elif defined(SLJIT_CONFIG_X86_32)<br> 527 <ul> 528 system("objdump -b binary -m i386 -D /tmp/slj_dump");<br> 529 </ul> 530 #endif<br> 531 } 532 </ul> 533 </div> 534 535 The branch example disassembling:<br> 536 <br> 537 0000000000000000 <.data>:<br> 538 <ul> 539 <table> 540 <tr><td>0:</td><td>53</td><td>push %rbx</td></tr> 541 <tr><td>1:</td><td>41 57</td><td>push %r15</td></tr> 542 <tr><td>3:</td><td>41 56</td><td>push %r14</td></tr> 543 <tr><td>5:</td><td>48 8b df</td><td>mov %rdi,%rbx</td></tr> 544 <tr><td>8:</td><td>4c 8b fe</td><td>mov %rsi,%r15</td></tr> 545 <tr><td>b:</td><td>4c 8b f2</td><td>mov %rdx,%r14</td></tr> 546 <tr><td>e:</td><td>48 83 ec 10</td><td>sub $0x10,%rsp</td></tr> 547 <tr><td>12:</td><td>48 89 d8</td><td>mov %rbx,%rax</td></tr> 548 <tr><td>15:</td><td>48 83 e0 01</td><td>and $0x1,%rax</td></tr> 549 <tr><td>19:</td><td>48 83 f8 00</td><td>cmp $0x0,%rax</td></tr> 550 <tr><td>1d:</td><td>74 05</td><td>je 0x24</td></tr> 551 <tr><td>1f:</td><td>4c 89 f8</td><td>mov %r15,%rax</td></tr> 552 <tr><td>22:</td><td>eb 03</td><td>jmp 0x27</td></tr> 553 <tr><td>24:</td><td>4c 89 f0</td><td>mov %r14,%rax</td></tr> 554 <tr><td>27:</td><td>48 83 c4 10</td><td>add $0x10,%rsp</td></tr> 555 <tr><td>2b:</td><td>41 5e</td><td>pop %r14</td></tr> 556 <tr><td>2d:</td><td>41 5f</td><td>pop %r15</td></tr> 557 <tr><td>2f:</td><td>5b</td><td>pop %rbx</td></tr> 558 <tr><td>30:</td><td>c3</td><td>retq</td></tr> 559 </table> 560 </ul> 561 <br> 562 with GCC -O2<br> 563 0000000000000000 <func>:<br> 564 <ul> 565 <table> 566 <tr><td>0:</td><td>48 89 d0</td><td>mov %rdx,%rax</td></tr> 567 <tr><td>3:</td><td>83 e7 01</td><td>and $0x1,%edi</td></tr> 568 <tr><td>6:</td><td>48 0f 45 c6</td><td>cmovne %rsi,%rax</td></tr> 569 <tr><td>a:</td><td>c3</td><td>retq</td></tr> 570 </table> 571 </ul> 572 <br> 573 Err... Ok, the optimization here may be weak, or, optimization there is crazy... :-)<br> 574 575 <table width="100%" cellspacing=0 cellpadding=0> 576 <tr><td align=right>By wenxichang#163.com, 2015.5.10</td></tr></table> 577 578 </td><td width=20 class="main"></td></tr> 579 <tr height=20><td width=20 class="main"></td><td width=720 class="main"></td><td width=20 class="main"></td></tr> 580 </table> 581 </center> 582 583 </body> 584 </html> 585