Home | History | Annotate | Download | only in x86
History log of /src/sys/crypto/aes/arch/x86/aes_sse2_impl.c
RevisionDateAuthorComments
 1.5  25-Jul-2020  riastradh Implement AES-CCM with SSE2.
 1.4  25-Jul-2020  riastradh Split aes_impl declarations out into aes_impl.h.

This will make it less painful to add more operations to struct
aes_impl without having to recompile everything that just uses the
block cipher directly or similar.
 1.3  30-Jun-2020  riastradh New test sys/crypto/aes/t_aes.

Runs aes_selftest on all kernel AES implementations supported on the
current hardware, not just the preferred one.
 1.2  29-Jun-2020  riastradh Split SSE2 logic into separate units.

Ensure that there are no paths into files compiled with -msse -msse2
at all except via fpu_kern_enter.

I didn't run into a practical problem with this, but let's not leave
a ticking time bomb for subsequent toolchain changes in case the mere
declaration of local __m128i variables causes trouble.
 1.1  29-Jun-2020  riastradh New SSE2-based bitsliced AES implementation.

This should work on essentially all x86 CPUs of the last two decades,
and may improve throughput over the portable C aes_ct implementation
from BearSSL by

(a) reducing the number of vector operations in sequence, and
(b) batching four rather than two blocks in parallel.

Derived from BearSSL'S aes_ct64 implementation adjusted so that where
aes_ct64 uses 64-bit q[0],...,q[7], aes_sse2 uses (q[0], q[4]), ...,
(q[3], q[7]), each tuple representing a pair of 64-bit quantities
stacked in a single 128-bit register. This translation was done very
naively, and mostly reduces the cost of ShiftRows and data movement
without doing anything to address the S-box or (Inv)MixColumns, which
spread all 64-bit quantities across separate registers and ignore the
upper halves.

Unfortunately, SSE2 -- which is all that is guaranteed on all amd64
CPUs -- doesn't have PSHUFB, which would help out a lot more. For
example, vpaes relies on that. Perhaps there are enough CPUs out
there with PSHUFB but not AES-NI to make it worthwhile to import or
adapt vpaes too.

Note: This includes local definitions of various Intel compiler
intrinsics for gcc and clang in terms of their __builtin_* &c.,
because the necessary header files are not available during the
kernel build. This is a kludge -- we should fix it properly; the
present approach is expedient but not ideal.

RSS XML Feed