History log of /src/sys/crypto/aes/arch/arm/
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.5 25-Jul-2020 riastradh

Implement AES-CCM with ARMv8.5-AES.


1.4 25-Jul-2020 riastradh

Split aes_impl declarations out into aes_impl.h.

This will make it less painful to add more operations to struct
aes_impl without having to recompile everything that just uses the
block cipher directly or similar.


1.3 30-Jun-2020 riastradh

New test sys/crypto/aes/t_aes.

Runs aes_selftest on all kernel AES implementations supported on the
current hardware, not just the preferred one.


1.2 29-Jun-2020 riastradh

Move aarch64/fpu.h to arm/fpu.h.


1.1 29-Jun-2020 riastradh

Implement AES in kernel using ARMv8.0-AES on aarch64.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.3 25-Jul-2020 riastradh

Implement AES-CCM with ARMv8.5-AES.


1.2 25-Jul-2020 riastradh

Split aes_impl declarations out into aes_impl.h.

This will make it less painful to add more operations to struct
aes_impl without having to recompile everything that just uses the
block cipher directly or similar.


1.1 29-Jun-2020 riastradh

Implement AES in kernel using ARMv8.0-AES on aarch64.


1.16 27-Nov-2025 andvar

fix various typos in comments.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.15 08-Sep-2020 riastradh

aesarmv8: Reallocate registers to shave off unnecessary MOV.


1.14 08-Sep-2020 riastradh

aesarmv8: Issue two 4-register ld/st, not four 2-register ld/st.


1.13 08-Sep-2020 riastradh

aesarmv8: Adapt aes_armv8_64.S to big-endian.

Patch mainly from (and tested by) jakllsch@ with minor tweaks by me.


1.12 08-Aug-2020 riastradh

Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.


1.11 27-Jul-2020 riastradh

Add RCSIDs to the AES and ChaCha .S sources.


1.10 27-Jul-2020 riastradh

Issue aese/aesmc and aesd/aesimc in pairs.

Advised by the aarch64 optimization guide; increases cgd throughput
by about 10%.


1.9 27-Jul-2020 riastradh

Align critical-path loops in AES and ChaCha.


1.8 25-Jul-2020 riastradh

Implement AES-CCM with ARMv8.5-AES.


1.7 25-Jul-2020 riastradh

Invert some loops to save a branch instruction on every iteration.


1.6 22-Jul-2020 riastradh

Fix register name in comment.

Some time ago I reallocated the registers to avoid inadvertently
clobbering the callee-saves v9, but neglected to update the comment.


1.5 19-Jul-2020 ryo

fix build with clang/llvm.

clang aarch64 assembler doesn't accept optional number of lanes of vector register.
(but ARMARM says that an assembler must accept it)


1.4 30-Jun-2020 riastradh

Reallocate registers to avoid abusing callee-saves registers, v8-v15.

Forgot to consult the AAPCS before committing this before -- oops!

While here, take advantage of the 32 aarch64 simd registers to avoid
all stack spills.


1.3 30-Jun-2020 riastradh

Use `.arch_extension aes' for aese/aesmc/aesd/aesimc.

Unlike `.arch_extension crypto', this works with clang; both work
with gas, so we'll go with this.

Clang still can't handle aes_armv8_64.S yet -- it gets confused by
dup and mov on lanes, but this makes progress.


1.2 30-Jun-2020 riastradh

Use .p2align rather than .align.

Apparently on arm, .align is actually an alias for .p2align, taking a
power of two rather than a number of bytes, so aes_armv8_64.o was
bloated to 32KB with obscene alignment when it only needed to be
barely past 4KB.

Do the same for the x86 aes_ni_64.S -- even though .align takes a
number of bytes rather than a power of two on x86, let's just stay
away from the temptations of the evil .align directive.


1.1 29-Jun-2020 riastradh

Implement AES in kernel using ARMv8.0-AES on aarch64.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.6 21-Nov-2020 rin

Fix build with clang for earmv7hf; loadroundkey() is used only for __aarch64__.


1.5 08-Aug-2020 riastradh

branches: 1.5.2;
Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.


1.4 28-Jul-2020 riastradh

Draft 2x vectorized neon vpaes for aarch64.

Gives a modest speed boost on rk3399 (Cortex-A53/A72), around 20% in
cgd tests, for parallelizable operations like CBC decryption; same
improvement should probably carry over to rpi4 CPU which lacks
ARMv8.0-AES.


1.3 30-Jun-2020 riastradh

New test sys/crypto/aes/t_aes.

Runs aes_selftest on all kernel AES implementations supported on the
current hardware, not just the preferred one.


1.2 29-Jun-2020 riastradh

Provide hand-written AES NEON assembly for arm32.

gcc does a lousy job at compiling 128-bit NEON intrinsics on arm32;
hand-writing it made it about 12x faster, by avoiding a zillion loads
and stores to spill everything and the kitchen sink onto the stack.
(But gcc does fine on aarch64, presumably because it has twice as
many registers and doesn't have to deal with q2=d4/d5 overlapping.)


1.1 29-Jun-2020 riastradh

New permutation-based AES implementation using ARM NEON.

Also derived from Mike Hamburg's public-domain vpaes code.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.3 25-Jul-2020 riastradh

Implement AES-CCM with NEON.


1.2 25-Jul-2020 riastradh

Split aes_impl declarations out into aes_impl.h.

This will make it less painful to add more operations to struct
aes_impl without having to recompile everything that just uses the
block cipher directly or similar.


1.1 29-Jun-2020 riastradh

New permutation-based AES implementation using ARM NEON.

Also derived from Mike Hamburg's public-domain vpaes code.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.11 10-Sep-2020 riastradh

aes neon: Gather mc_forward/backward so we can load 256 bits at once.


1.10 10-Sep-2020 riastradh

aes neon: Hoist dsbd/dsbe address calculation out of loop.


1.9 10-Sep-2020 riastradh

aes neon: Tweak register usage.

- Call r12 by its usual name, ip.
- No need for r7 or r11=fp at the moment.


1.8 10-Sep-2020 riastradh

aes neon: Write vtbl with {qN} rather than {d(2N)-d(2N+1)}.

Cosmetic; no functional change.


1.7 10-Sep-2020 riastradh

aes neon: Issue 256-bit loads rather than pairs of 128-bit loads.

Not sure why I didn't realize you could do this before!

Saves some temporary registers that can now be allocated to shave off
a few cycles.


1.6 16-Aug-2020 riastradh

Fix AES NEON code for big-endian softfp ARM.

...which is how the kernel runs. Switch to using __SOFTFP__ for
consistency with how it gets exposed to C, although I'm not sure how
to get it defined automagically in the toolchain for .S files so
that's set manually in files.aesneon for now.


1.5 08-Aug-2020 riastradh

Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.


1.4 27-Jul-2020 riastradh

Add RCSIDs to the AES and ChaCha .S sources.


1.3 27-Jul-2020 riastradh

Align critical-path loops in AES and ChaCha.


1.2 27-Jul-2020 riastradh

PIC for aes_neon_32.S.

Without this, tests/sys/crypto/aes/t_aes fails to start on armv7
because of R_ARM_ABS32 relocations in a nonwritable text segment for
a PIE -- which atf quietly ignores in the final report! Yikes.


1.1 29-Jun-2020 riastradh

Provide hand-written AES NEON assembly for arm32.

gcc does a lousy job at compiling 128-bit NEON intrinsics on arm32;
hand-writing it made it about 12x faster, by avoiding a zillion loads
and stores to spill everything and the kitchen sink onto the stack.
(But gcc does fine on aarch64, presumably because it has twice as
many registers and doesn't have to deal with q2=d4/d5 overlapping.)


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.5 10-Oct-2020 jmcneill

Fix detection of NEON features. ID_AA64PFR0_EL1_ADV_SIMD_NONE means SIMD
is not available, and any other value means it is.


1.4 25-Jul-2020 riastradh

Implement AES-CCM with NEON.


1.3 25-Jul-2020 riastradh

Split aes_impl declarations out into aes_impl.h.

This will make it less painful to add more operations to struct
aes_impl without having to recompile everything that just uses the
block cipher directly or similar.


1.2 30-Jun-2020 riastradh

New test sys/crypto/aes/t_aes.

Runs aes_selftest on all kernel AES implementations supported on the
current hardware, not just the preferred one.


1.1 29-Jun-2020 riastradh

New permutation-based AES implementation using ARM NEON.

Also derived from Mike Hamburg's public-domain vpaes code.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base perseant-exfatfs-base-20240630 perseant-exfatfs-base thorpej-ifq-base thorpej-altq-separation-base
1.4 07-Aug-2023 rin

sys/crypto: Introduce arch/{arm,x86} to share common MD headers

Dedup between aes and chacha. No binary changes.


Revision tags: netbsd-10-1-RELEASE netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.3 08-Aug-2020 riastradh

Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.


1.2 28-Jul-2020 riastradh

Draft 2x vectorized neon vpaes for aarch64.

Gives a modest speed boost on rk3399 (Cortex-A53/A72), around 20% in
cgd tests, for parallelizable operations like CBC decryption; same
improvement should probably carry over to rpi4 CPU which lacks
ARMv8.0-AES.


1.1 29-Jun-2020 riastradh

New permutation-based AES implementation using ARM NEON.

Also derived from Mike Hamburg's public-domain vpaes code.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base
1.8 26-Jun-2022 riastradh

arm/aes_neon: Fix formatting of self-test failure message.

Discovered by code inspection. Remarkably, a combination of errors
made this fail to be a stack buffer overrun. Verified by booting
with ARMv8.0-AES disabled and with the self-test artificially made to
fail.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.7 09-Aug-2020 riastradh

Use vshlq_n_s32 rather than vsliq_n_s32 with zero destination.

Not sure why I reached for vsliq_n_s32 at first -- probably so I
wouldn't have to deal with a new intrinsic in arm_neon.h!


1.6 09-Aug-2020 riastradh

Nix outdated comment.

I implemented this parallelism a couple weeks ago.


1.5 08-Aug-2020 riastradh

Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.


1.4 28-Jul-2020 riastradh

Draft 2x vectorized neon vpaes for aarch64.

Gives a modest speed boost on rk3399 (Cortex-A53/A72), around 20% in
cgd tests, for parallelizable operations like CBC decryption; same
improvement should probably carry over to rpi4 CPU which lacks
ARMv8.0-AES.


1.3 25-Jul-2020 riastradh

Implement AES-CCM with NEON.


1.2 30-Jun-2020 riastradh

New test sys/crypto/aes/t_aes.

Runs aes_selftest on all kernel AES implementations supported on the
current hardware, not just the preferred one.


1.1 29-Jun-2020 riastradh

New permutation-based AES implementation using ARM NEON.

Also derived from Mike Hamburg's public-domain vpaes code.


Revision tags: perseant-exfatfs-base-20250801 perseant-exfatfs-base-20240630 perseant-exfatfs-base
1.13 07-Aug-2023 rin

sys/crypto: Introduce arch/{arm,x86} to share common MD headers

Dedup between aes and chacha. No binary changes.


1.12 07-Aug-2023 rin

sys/crypto/{aes,chacha}/arch/arm/arm_neon.h: Sync (whitespace fix)

No binary changes.


Revision tags: netbsd-10-1-RELEASE netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.11 07-Sep-2020 jakllsch

Fix vgetq_lane_u32 for aarch64eb with GCC

Fixes NEON AES on aarch64eb


1.10 09-Aug-2020 riastradh

Fix some clang neon intrinsics.

Compile-tested only, with -Wno-nonportable-vector-initializers. Need
to address -- and test -- this stuff properly but this is progress.


1.9 09-Aug-2020 riastradh

Use vshlq_n_s32 rather than vsliq_n_s32 with zero destination.

Not sure why I reached for vsliq_n_s32 at first -- probably so I
wouldn't have to deal with a new intrinsic in arm_neon.h!


1.8 08-Aug-2020 riastradh

Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.


1.7 28-Jul-2020 riastradh

Draft 2x vectorized neon vpaes for aarch64.

Gives a modest speed boost on rk3399 (Cortex-A53/A72), around 20% in
cgd tests, for parallelizable operations like CBC decryption; same
improvement should probably carry over to rpi4 CPU which lacks
ARMv8.0-AES.


1.6 25-Jul-2020 riastradh

Add 32-bit load, store, and shift intrinsics.

vld1q_u32
vst1q_u32
vshlq_n_u32
vshrq_n_u32


1.5 25-Jul-2020 riastradh

Fix missing clang big-endian case.


1.4 25-Jul-2020 riastradh

Implement AES-CCM with NEON.


1.3 23-Jul-2020 ryo

fix build with llvm/clang.


1.2 30-Jun-2020 riastradh

Tweak clang neon intrinsics so they build.

(this file is still a kludge)


1.1 29-Jun-2020 riastradh

New permutation-based AES implementation using ARM NEON.

Also derived from Mike Hamburg's public-domain vpaes code.


Revision tags: perseant-exfatfs-base-20250801 perseant-exfatfs-base-20240630 perseant-exfatfs-base
1.3 07-Aug-2023 rin

sys/crypto: Introduce arch/{arm,x86} to share common MD headers

Dedup between aes and chacha. No binary changes.


Revision tags: netbsd-10-1-RELEASE netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.2 09-Aug-2020 riastradh

Fix mistake in big-endian arm clang.

Swapped the two halves (only gcc does that, I think) and wrote j,i
backwards, oops.

(I don't have a big-endian arm clang build handy to test; hoping this
works.)


1.1 08-Aug-2020 riastradh

Fix ARM NEON implementations of AES and ChaCha on big-endian ARM.

New macros such as VQ_N_U32(a,b,c,d) for NEON vector initializers.
Needed because GCC and Clang disagree on the ordering of lanes,
depending on whether it's 64-bit big-endian, 32-bit big-endian, or
little-endian -- and, bizarrely, both of them disagree with the
architectural numbering of lanes.

Experimented with using

static const uint8_t x8[16] = {...};

uint8x16_t x = vld1q_u8(x8);

which doesn't require knowing anything about the ordering of lanes,
but this generates considerably worse code and apparently confuses
GCC into not recognizing the constant value of x8.

Fix some clang mistakes while here too.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.1 29-Jun-2020 riastradh

Implement AES in kernel using ARMv8.0-AES on aarch64.


Revision tags: perseant-exfatfs-base-20250801 netbsd-11-base netbsd-10-1-RELEASE perseant-exfatfs-base-20240630 perseant-exfatfs-base netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
1.5 08-Sep-2020 jakllsch

Acknowledge clang warning for NEON cipher code on aarch64eb

We've already made the nonportable vector initializations portable; the
code works on aarch64eb.


1.4 16-Aug-2020 riastradh

Fix AES NEON code for big-endian softfp ARM.

...which is how the kernel runs. Switch to using __SOFTFP__ for
consistency with how it gets exposed to C, although I'm not sure how
to get it defined automagically in the toolchain for .S files so
that's set manually in files.aesneon for now.


1.3 30-Jun-2020 riastradh

Limit aes_neon to cpu_cortex | aarch64.

We won't use it on any other systems, and it doesn't build without
NEON anyway. Verified earmv7hf GENERIC, aarch64 GENERIC64, and
earmv6 RPI2 all build with this.


1.2 29-Jun-2020 riastradh

Provide hand-written AES NEON assembly for arm32.

gcc does a lousy job at compiling 128-bit NEON intrinsics on arm32;
hand-writing it made it about 12x faster, by avoiding a zillion loads
and stores to spill everything and the kitchen sink onto the stack.
(But gcc does fine on aarch64, presumably because it has twice as
many registers and doesn't have to deal with q2=d4/d5 overlapping.)


1.1 29-Jun-2020 riastradh

New permutation-based AES implementation using ARM NEON.

Also derived from Mike Hamburg's public-domain vpaes code.