Cross Reference: /src/sys/crypto/aes/arch/x86/aes_sse2

History log of /src/sys/crypto/aes/arch/x86/aes_sse2_4x32.c
Revision		Date	Author	Comments
# 1.1		23-Nov-2025	riastradh	aes(9): Rewrite x86 SSE2 implementation. This computes eight AES_k instances simultaneously, using the bitsliced 32-bit aes_ct logic which computes two blocks at a time in uint32_t arithmetic, vectorized four ways. Previously, the SSE2 code was a very naive adaptation of aes_ct64, which computes four blocks at a time in uint64_t arithmetic, without any 2x vectorization -- I did it at the time because: (a) it was easier to get working, (b) it only affects really old hardware with neither AES-NI nor SSSE3 which are both much much faster. But it was bugging me that this was a kind of dumb use of SSE2. Substantially reduces stack usage (from ~1200 bytes to ~800 bytes) and should approximately double throughput for CBC decryption and for XTS encryption/decryption. I also tried a 2x64 version but cursory performance measurements didn't reveal much benefit over 4x32. (If anyone is interested in doing more serious performance measurements, on ancient hardware for which it might matter, I also have the 2x64 code around.) Prompted by: PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized version in kernel