Cross Reference: /src/sys/crypto/aes/arch/x86/aes_sse2

History log of /src/sys/crypto/aes/arch/x86/aes_sse2_impl.c
Revision	Date	Author	Comments
1.5	25-Jul-2020	riastradh	Implement AES-CCM with SSE2.
1.4	25-Jul-2020	riastradh	Split aes_impl declarations out into aes_impl.h. This will make it less painful to add more operations to struct aes_impl without having to recompile everything that just uses the block cipher directly or similar.
1.3	30-Jun-2020	riastradh	New test sys/crypto/aes/t_aes. Runs aes_selftest on all kernel AES implementations supported on the current hardware, not just the preferred one.
1.2	29-Jun-2020	riastradh	Split SSE2 logic into separate units. Ensure that there are no paths into files compiled with -msse -msse2 at all except via fpu_kern_enter. I didn't run into a practical problem with this, but let's not leave a ticking time bomb for subsequent toolchain changes in case the mere declaration of local __m128i variables causes trouble.
1.1	29-Jun-2020	riastradh	New SSE2-based bitsliced AES implementation. This should work on essentially all x86 CPUs of the last two decades, and may improve throughput over the portable C aes_ct implementation from BearSSL by (a) reducing the number of vector operations in sequence, and (b) batching four rather than two blocks in parallel. Derived from BearSSL'S aes_ct64 implementation adjusted so that where aes_ct64 uses 64-bit q[0],...,q[7], aes_sse2 uses (q[0], q[4]), ..., (q[3], q[7]), each tuple representing a pair of 64-bit quantities stacked in a single 128-bit register. This translation was done very naively, and mostly reduces the cost of ShiftRows and data movement without doing anything to address the S-box or (Inv)MixColumns, which spread all 64-bit quantities across separate registers and ignore the upper halves. Unfortunately, SSE2 -- which is all that is guaranteed on all amd64 CPUs -- doesn't have PSHUFB, which would help out a lot more. For example, vpaes relies on that. Perhaps there are enough CPUs out there with PSHUFB but not AES-NI to make it worthwhile to import or adapt vpaes too. Note: This includes local definitions of various Intel compiler intrinsics for gcc and clang in terms of their __builtin_* &c., because the necessary header files are not available during the kernel build. This is a kludge -- we should fix it properly; the present approach is expedient but not ideal.

OpenGrok