|
Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base
|
| 1.2 |
25-Jul-2020 |
riastradh |
Add kernel ChaCha test to exercise all available implementations.
|
| 1.1 |
30-Jun-2020 |
riastradh |
New test sys/crypto/aes/t_aes.
Runs aes_selftest on all kernel AES implementations supported on the current hardware, not just the preferred one.
|
| 1.11 |
24-Nov-2025 |
nia |
Needs the same compiler bug workaround as the kernel.
|
| 1.10 |
23-Nov-2025 |
riastradh |
aes(9): Rewrite x86 SSE2 implementation.
This computes eight AES_k instances simultaneously, using the bitsliced 32-bit aes_ct logic which computes two blocks at a time in uint32_t arithmetic, vectorized four ways.
Previously, the SSE2 code was a very naive adaptation of aes_ct64, which computes four blocks at a time in uint64_t arithmetic, without any 2x vectorization -- I did it at the time because:
(a) it was easier to get working, (b) it only affects really old hardware with neither AES-NI nor SSSE3 which are both much much faster.
But it was bugging me that this was a kind of dumb use of SSE2.
Substantially reduces stack usage (from ~1200 bytes to ~800 bytes) and should approximately double throughput for CBC decryption and for XTS encryption/decryption.
I also tried a 2x64 version but cursory performance measurements didn't reveal much benefit over 4x32. (If anyone is interested in doing more serious performance measurements, on ancient hardware for which it might matter, I also have the 2x64 code around.)
Prompted by:
PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized version in kernel
|
| 1.9 |
23-Nov-2025 |
riastradh |
aes(9): New 64-bit bitsliced implementation.
Derived from BearSSL's aes_ct64 code. Compared to the aes_ct code, on machines with native 64-bit integer arithmetic, aes_ct64 should have approximately:
- the same throughput for: . CBC encryption, . CCM encryption/decryption, and . CBC-MAC;
- double the throughput for: . CBC decryption, . XTS encryption/decryption.
(aes_ct computes AES on two blocks at a time; aes_ct64 computes it on four blocks at a time, with roughly the same number of instructions. CBC encryption and CBC-MAC are inherently sequential; CCM, being a combination of CTR and CBC-MAC, can only really be parallelized two ways, so having four ways available doesn't help; and CBC decryption and XTS admit parallelism limited only by the size of the inputs.)
Enable with `options AES_BEAR64'. Should be a reasonable default on all platforms with 64-bit integer registers. Caveat: uses about 1200 bytes of stack space. (Could approximately halve that, like the BearSSL aes_ct code, at some speed cost which I haven't measured -- by moving the br_aes_ct64_skey_expand logic into add_round_key in aes_ct64_{enc,dec}.c.)
PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized version in kernel
|
| 1.8 |
22-Nov-2025 |
riastradh |
aes(9): New aes_keysched_enc/dec.
These implement the standard key schedule. They are named independently of any particular AES implementation, so that:
(a) we can swap between the BearSSL aes_ct and aes_ct64 code without changing all the callers who don't care which one they get, and
(b) we could push it into the aes_impl abstraction if we wanted.
This eliminates all br_aes_* references outside aes_bear.c, aes_ct*.c, and the new aes_keysched.c wrappers.
Preparation for:
PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized version in kernel
|
|
Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base perseant-exfatfs-base-20240630 perseant-exfatfs-base
|
| 1.7 |
08-Aug-2023 |
mrg |
introduce new GCC 12 warning disables and use them in a few places
this introduces 4 new warning disable flags:
CC_WNO_MISSING_TEMPLATE_KEYWORD CC_WNO_REGISTER CC_WNO_STRINGOP_OVERREAD CC_WNO_ARRAY_BOUNDS
and documents them in README.warnings. of these, the string op and array bounds are both problematic (real bugs) and also spurious (not real bugs), and the other 2 are mostly temporary for older 3rd party code.
add some new uses of CC_WNO_STRINGOP_OVERFLOW.
fix m68k build for gallium and GCC 12.
|
|
Revision tags: netbsd-10-1-RELEASE netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base
|
| 1.6 |
08-Sep-2020 |
jakllsch |
Acknowledge clang warning for NEON cipher code on aarch64eb
We've already made the nonportable vector initializations portable; the code works on aarch64eb.
|
| 1.5 |
17-Aug-2020 |
riastradh |
Make the AES and ChaCha NEON tests work in softfloat userland.
(`Softfloat' here refers to the ABI, which of course may be running on a CPU with NEON.)
|
| 1.4 |
16-Aug-2020 |
martin |
Restrict the NEON code to v7hf - the softfloat toolchain does not like it (nor is it likely to work if there is no FPU present).
|
| 1.3 |
25-Jul-2020 |
riastradh |
Implement AES-CCM with ARMv8.5-AES.
|
| 1.2 |
01-Jul-2020 |
riastradh |
Pass the requisite -msse options for i386.
|
| 1.1 |
30-Jun-2020 |
riastradh |
New test sys/crypto/aes/t_aes.
Runs aes_selftest on all kernel AES implementations supported on the current hardware, not just the preferred one.
|
| 1.6 |
23-Nov-2025 |
riastradh |
aes(9): Rewrite x86 SSE2 implementation.
This computes eight AES_k instances simultaneously, using the bitsliced 32-bit aes_ct logic which computes two blocks at a time in uint32_t arithmetic, vectorized four ways.
Previously, the SSE2 code was a very naive adaptation of aes_ct64, which computes four blocks at a time in uint64_t arithmetic, without any 2x vectorization -- I did it at the time because:
(a) it was easier to get working, (b) it only affects really old hardware with neither AES-NI nor SSSE3 which are both much much faster.
But it was bugging me that this was a kind of dumb use of SSE2.
Substantially reduces stack usage (from ~1200 bytes to ~800 bytes) and should approximately double throughput for CBC decryption and for XTS encryption/decryption.
I also tried a 2x64 version but cursory performance measurements didn't reveal much benefit over 4x32. (If anyone is interested in doing more serious performance measurements, on ancient hardware for which it might matter, I also have the 2x64 code around.)
Prompted by:
PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized version in kernel
|
| 1.5 |
23-Nov-2025 |
riastradh |
aes(9): New 64-bit bitsliced implementation.
Derived from BearSSL's aes_ct64 code. Compared to the aes_ct code, on machines with native 64-bit integer arithmetic, aes_ct64 should have approximately:
- the same throughput for: . CBC encryption, . CCM encryption/decryption, and . CBC-MAC;
- double the throughput for: . CBC decryption, . XTS encryption/decryption.
(aes_ct computes AES on two blocks at a time; aes_ct64 computes it on four blocks at a time, with roughly the same number of instructions. CBC encryption and CBC-MAC are inherently sequential; CCM, being a combination of CTR and CBC-MAC, can only really be parallelized two ways, so having four ways available doesn't help; and CBC decryption and XTS admit parallelism limited only by the size of the inputs.)
Enable with `options AES_BEAR64'. Should be a reasonable default on all platforms with 64-bit integer registers. Caveat: uses about 1200 bytes of stack space. (Could approximately halve that, like the BearSSL aes_ct code, at some speed cost which I haven't measured -- by moving the br_aes_ct64_skey_expand logic into add_round_key in aes_ct64_{enc,dec}.c.)
PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized version in kernel
|
|
Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base
|
| 1.4 |
17-Aug-2020 |
riastradh |
Make the AES and ChaCha NEON tests work in softfloat userland.
(`Softfloat' here refers to the ABI, which of course may be running on a CPU with NEON.)
|
| 1.3 |
26-Jul-2020 |
riastradh |
Sort includes.
|
| 1.2 |
26-Jul-2020 |
martin |
Add missing include to fix the build on architectures w/o any special accelerated AES implementation.
|
| 1.1 |
30-Jun-2020 |
riastradh |
New test sys/crypto/aes/t_aes.
Runs aes_selftest on all kernel AES implementations supported on the current hardware, not just the preferred one.
|
|
Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base perseant-exfatfs-base-20240630 perseant-exfatfs-base
|
| 1.8 |
05-Sep-2023 |
mrg |
apply previous to just GCC.
|
| 1.7 |
05-Sep-2023 |
mrg |
apply -Wno-maybe-uninitialized to chacha_sse2.c.
there's a clearly initialised memory region that is claimed as being maybe uninitialised, and this test-build version of it triggers it while the normal build doesn't.
|
|
Revision tags: netbsd-10-1-RELEASE netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base
|
| 1.6 |
08-Sep-2020 |
jakllsch |
Acknowledge clang warning for NEON cipher code on aarch64eb
We've already made the nonportable vector initializations portable; the code works on aarch64eb.
|
| 1.5 |
17-Aug-2020 |
riastradh |
Make the AES and ChaCha NEON tests work in softfloat userland.
(`Softfloat' here refers to the ABI, which of course may be running on a CPU with NEON.)
|
| 1.4 |
16-Aug-2020 |
martin |
Restrict the NEON code to v7hf - the softfloat toolchain does not like it (nor is it likely to work if there is no FPU present).
|
| 1.3 |
28-Jul-2020 |
riastradh |
Implement 4-way vectorization of ChaCha for armv7 NEON.
cgd performance is not as good as I was hoping (~4% improvement over chacha_ref.c) but it should improve substantially more if we let the cgd worker thread keep fpu state so we don't have to pay the cost of isb and zero-the-fpu on every 512-byte cgd block.
|
| 1.2 |
27-Jul-2020 |
riastradh |
Enable ChaCha NEON code on armv7 too.
The 4-blocks-at-a-time assembly helper is disabled for now; adapting it to armv7 is going to be a little annoying with only 16 128-bit vector registers.
(Should also do a fifth block in the integer registers for 320 bytes at a time.)
|
| 1.1 |
25-Jul-2020 |
riastradh |
Add kernel ChaCha test to exercise all available implementations.
|
|
Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base
|
| 1.4 |
17-Aug-2020 |
riastradh |
Make the AES and ChaCha NEON tests work in softfloat userland.
(`Softfloat' here refers to the ABI, which of course may be running on a CPU with NEON.)
|
| 1.3 |
27-Jul-2020 |
riastradh |
It's __ARM_NEON, not __ARM_NEON__, sometimes, apparently.
|
| 1.2 |
27-Jul-2020 |
riastradh |
Enable ChaCha NEON code on armv7 too.
The 4-blocks-at-a-time assembly helper is disabled for now; adapting it to armv7 is going to be a little annoying with only 16 128-bit vector registers.
(Should also do a fifth block in the integer registers for 320 bytes at a time.)
|
| 1.1 |
25-Jul-2020 |
riastradh |
Add kernel ChaCha test to exercise all available implementations.
|