History log of /src/sys/crypto/aes/arch/arm/aes_neon_32.S |
Revision | | Date | Author | Comments |
1.11 |
| 10-Sep-2020 |
riastradh | aes neon: Gather mc_forward/backward so we can load 256 bits at once.
|
1.10 |
| 10-Sep-2020 |
riastradh | aes neon: Hoist dsbd/dsbe address calculation out of loop.
|
1.9 |
| 10-Sep-2020 |
riastradh | aes neon: Tweak register usage.
- Call r12 by its usual name, ip. - No need for r7 or r11=fp at the moment.
|
1.8 |
| 10-Sep-2020 |
riastradh | aes neon: Write vtbl with {qN} rather than {d(2N)-d(2N+1)}.
Cosmetic; no functional change.
|
1.7 |
| 10-Sep-2020 |
riastradh | aes neon: Issue 256-bit loads rather than pairs of 128-bit loads.
Not sure why I didn't realize you could do this before!
Saves some temporary registers that can now be allocated to shave off a few cycles.
|
1.6 |
| 16-Aug-2020 |
riastradh | Fix AES NEON code for big-endian softfp ARM.
...which is how the kernel runs. Switch to using __SOFTFP__ for consistency with how it gets exposed to C, although I'm not sure how to get it defined automagically in the toolchain for .S files so that's set manually in files.aesneon for now.
|
1.5 |
| 08-Aug-2020 |
riastradh | Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.
New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers. Needed because GCC and Clang disagree on the ordering of lanes, depending on whether it's 64-bit big-endian, 32-bit big-endian, or little-endian -- and, bizarrely, both of them disagree with the architectural numbering of lanes.
Experimented with using
static const uint8_t x8[16] = {...};
uint8x16_t x = vld1q_u8(x8);
which doesn't require knowing anything about the ordering of lanes, but this generates considerably worse code and apparently confuses GCC into not recognizing the constant value of x8.
Fix some clang mistakes while here too.
|
1.4 |
| 27-Jul-2020 |
riastradh | Add RCSIDs to the AES and ChaCha .S sources.
|
1.3 |
| 27-Jul-2020 |
riastradh | Align critical-path loops in AES and ChaCha.
|
1.2 |
| 27-Jul-2020 |
riastradh | PIC for aes_neon_32.S.
Without this, tests/sys/crypto/aes/t_aes fails to start on armv7 because of R_ARM_ABS32 relocations in a nonwritable text segment for a PIE -- which atf quietly ignores in the final report! Yikes.
|
1.1 |
| 29-Jun-2020 |
riastradh | Provide hand-written AES NEON assembly for arm32.
gcc does a lousy job at compiling 128-bit NEON intrinsics on arm32; hand-writing it made it about 12x faster, by avoiding a zillion loads and stores to spill everything and the kitchen sink onto the stack. (But gcc does fine on aarch64, presumably because it has twice as many registers and doesn't have to deal with q2=d4/d5 overlapping.)
|