Home | History | Annotate | Download | only in net
History log of /src/sys/net/if_wg.c
RevisionDateAuthorComments
 1.135  27-Dec-2024  riastradh wg(4): Fix thinko in previous. Should unbreak the rump build.

PR kern/58938: wg tunnel dies after a few days
 1.134  27-Dec-2024  riastradh wg(4): Add debug log for which address we send handshake msgs to.

Maybe this will help to diagnose:

PR kern/58938: wg tunnel dies after a few days
 1.133  28-Nov-2024  riastradh wg(4): Avoid spurious kassert for harmless race in session retry.

If we have already transitioned away from INIT_ACTIVE by the time the
retry timer has fired, the handshake start time may have been zeroed,
but that's harmless. So don't kassert about it until after we've
verified we're still in INIT_ACTIVE state.

PR kern/58859: KASSERT in wg_task_retry_handshake
 1.132  08-Oct-2024  riastradh wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.

PR kern/58688: userland panic of kernel via wg(4)
 1.131  31-Jul-2024  riastradh wg(4): Add Internet Archive links for the versions cited.

No functional change.
 1.130  31-Jul-2024  riastradh wg(4): Make a rule for who wins when both peers send INIT at once.

The rule is that the peer with the numerically smaller public key
hash, in little-endian, takes priority iff the low order bit of

H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)

is 0, and the peer with the lexicographically larger public key takes
priority iff the low-order bit is 1.

Another case of:

PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

This one is, as far as I can tell, simply a deadlock in the protocol
of the whitepaper -- until both sides give up on the handshake and
one of them (but not both) later decides to try sending data again.

(But not related to our t_misc:wg_rekey test, as far as I can tell,
and I haven't put enough thought into how to reliably trigger this
race to write a new automatic test for it.)
 1.129  29-Jul-2024  riastradh wg(4): Sprinkle volatile on variables requiring atomic access.

No functional change intended, since the relevant access is always
done with atomic_* when it might race with concurrent access -- and
really this should be _Atomic or something. But for now our
atomic_ops(9) API is still spelled with volatile, so we'll use that.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.128  29-Jul-2024  riastradh wg(4): When a session is established, send first packet directly.

Like we would do with the keepalive packet, if we had to send that
instead -- no need to defer it to the pktq. Keep it simple.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.127  29-Jul-2024  riastradh wg(4): Queue packet for post-handshake retransmit if limits are hit.

PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
 1.126  29-Jul-2024  riastradh wg(4): Trigger session initiation in wgintr, not in wg_output.

We have to look up the session in wgintr anyway, for
wg_send_data_msg. By triggering session initiation in wgintr instead
of wg_output, we can skip the stable session lookup and reference in
wg_output -- simpler that way.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.125  29-Jul-2024  riastradh wg(4): Add missing barriers around wgp_pending access.

PR kern/58520: experimental wg(4) lacks barriers around access to
packet pending initiation
 1.124  29-Jul-2024  riastradh wg(4): Force rekey on tx if session is older than reject-after-time.

One more corner case for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.123  29-Jul-2024  riastradh wg(4): Read wgs_state atomically in wg_get_stable_session.

As noted in the comment above, it may concurrently transition from
ESTABLISHED to DESTROYING.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.122  29-Jul-2024  riastradh wg(4): Deduplicate session establishment actions.

The actions to

(a) record the last handshake time,
(b) clear some handshake state,
(c) transmit first data if queued, or (if initiator) keepalive, and
(d) begin destroying the old session,

were formerly duplicated between wg_handle_msg_resp (for when we're
the initiator) and wg_task_establish_session (for when we're the
responder).

Instead, let's factor this out into wg_swap_session so there's only
one copy of the logic.

This requires moving wg_update_endpoint_if_necessary a little earlier
in wg_handle_msg_resp -- which should be done anyway so that the
endpoint is updated _before_ the session is published for the data tx
path to use.

Other than moving wg_update_endpoint_if_necessary a little earlier,
no functional change intended.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.121  29-Jul-2024  riastradh wg(4): Sprinkle comments on internal sliding window API.

Post-fix tidying for:

PR kern/58480: experimental wg(4) sliding window logic has oopsie
 1.120  29-Jul-2024  riastradh wg(4): Omit needless atomic_load.

wgs_local_index is only ever written to while only one thread has
access to it and it is not in the thmap -- before it is published in
wg_get_session_index, and after it is unpublished in
wg_destroy_session. So no need for atomic_load -- it is stable if we
observe it in thmap_get result.

(Of course this is only for an assertion, which if tripped obviously
indicates a violation of our assumptions. But if that happens, well,
in the worst case we'll see a weird assertion message claiming that
the index is not equal to itself, which from which we can conclude
there must have been a concurrent update, which is good enough to
help diagnose that problem without any atomic_load.)

Tidying some of the changes for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.119  29-Jul-2024  riastradh wg(4): Fix typo in comment recently added.

Comment added in the service of:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.118  29-Jul-2024  riastradh wg(4): Fix memory ordering in detach.

PR kern/58510: experimental wg(4) lacks memory ordering between
wg_count_dec and module unload
 1.117  29-Jul-2024  riastradh wg(4): No need for atomic access to wgs_time_established in tx/rx.

This is stable while the session is visible to the tx/rx paths -- it
is initialized before the session is exposed to tx/rx, and doesn't
change until the session is no longer used by any tx/rx path and has
been recycled.

When I sprinkled atomic access to wgs_time_established in if_wg.c
rev. 1.104, it was a vestige of an uncommitted draft that did the
transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
an attempt to enable prompter tx on the new session as soon as it is
established. This turned out to be unnecessary, so I reverted most
of it, but forgot that wgs_time_established no longer needed atomic
treatment.

We could go back to using time_t and time_uptime, now that there's no
need to do atomic loads and stores on these quantities. But there's
no point in 64-bit arithmetic when the time differences are all
guaranteed bounded by a few minutes, so keeping it 32-bit is probably
a slight performance improvement on 32-bit systems.

(In contrast, wgs_time_last_data_sent is both written and read in the
tx path, which may run in parallel on multiple CPUs, so it still
requires the atomic treatment.)

Tidying up for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.116  29-Jul-2024  riastradh wg(4): Sprinkle comments into wg_swap_sessions.

No functional change intended.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.115  29-Jul-2024  riastradh wg(4): Queue pending packet in FIFO order, not LIFO order.

Sometimes the session takes a seconds to establish, for whatever
reason. It is better if the pending packet, which we queue up to
send as soon as we get the responder's handshake response, is the
most recent packet, rather than the first packet.

That way, we don't wind up with a weird multi-second-delayed ping,
followed by a bunch of dropped, followed by normal ping timings, or
wind up sending the first TCP SYN instead of the most recent, or what
have you. Senders need to be prepared to retransmit anyway if
packets are dropped.

PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
first handshake
 1.114  29-Jul-2024  riastradh wg(4): Sprinkle static on fixed-size array parameters.

Let's make the static size declarations useful.

No functional change intended.
 1.113  29-Jul-2024  riastradh wg(4): Put force_rekey state in the session, not the peer.

That way, there is a time when one thread has exclusive access to the
state, in wg_destroy_session under the peer lock, when we can clear
the state without racing against the data tx path.

This will work more reliably than the atomic_swap_uint I used before.

Noted by kre@.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.112  28-Jul-2024  riastradh wg(4): Explain why gethexdump/puthexdump is there, and tidy.

This way I will not be tempted to replace it by in-line calls to
libkern hexdump.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.111  28-Jul-2024  riastradh wg(4): Delete temporary hacks to dump keys and packets.

No longer useful for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.110  28-Jul-2024  riastradh wg(4): Parenthesize macro expansions properly.

PR kern/58480: experimental wg(4) sliding window logic has oopsie
 1.109  28-Jul-2024  riastradh wg(4): Be more consistent about #ifdef INET/INET6.

PR kern/58478: experimental wg(4) probably doesn't build with
INET6-only
 1.108  28-Jul-2024  riastradh wg(4): Tidy up error branches.

No functional change intended, except to add some log messages in
failure cases.

Cleanup after:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.107  28-Jul-2024  riastradh wg(4): Process all altq'd packets when deleting peer.

Can't just drop them because we can only go through all packets on an
interface at a time, for all peers -- so we'd either have to drop all
peers' packets, or requeue the packets for other peers. Probably not
worth the trouble, so let's just wait for all the packets currently
queued up to go through first.

This requires reordering teardown so that we wg_destroy_all_peers,
and thus wg_purge_pending_packets, _before_ we wg_if_detach, because
wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses.

PR kern/58477: experimental wg(4) ALTQ support is probably buggy
 1.106  28-Jul-2024  riastradh wg(4): Fix quotation in comment.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.105  28-Jul-2024  riastradh wg(4): Make time_uptime32 work in netbsd<=10.

This is the low 32 bits of time_uptime.

Will simplify pullups to 10 for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.104  28-Jul-2024  riastradh wg(4): Use 32-bit for times handled in rx/tx paths.

The rx and tx paths require unlocked access to wgs_time_established
(to decide whether it's time to rekey) and wgs_time_last_data_sent
(to decide whether we need to reply to incoming data with a keepalive
packet), so do it with atomic_load/store_*.

On 32-bit platforms, we may not be able to do that on time_t.
However, since sessions only last for a few minutes before
reject-after-time kicks in and they are erased, 32 bits is plenty to
record the durations that we need to record here, so this shouldn't
introduce any new bugs even on hosts that exceed 136 years of uptime.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.103  28-Jul-2024  riastradh wg(4): Make sure to update endpoint on keepalive packets too.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.102  28-Jul-2024  riastradh wg(4): On rx of valid ciphertext, make sure to update state machine.

Previously, we also required the plaintext to be a plausible-looking
IP packet before updating the state machine.

But keepalive packets are empty -- and if the peer initiated the
session to rekey after last tx but had no more data to tx, it will
send a keepalive to finish session initiation.

If we didn't update the state machine in that case, we would stay in
INIT_PASSIVE state unable to tx on the session, which would make
things hang.

So make sure to always update the state machine once we have accepted
a packet as genuine, even if it's genuine garbage on the inside.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.101  28-Jul-2024  riastradh wg(4): Reject rx on sessions older than reject-after-time sec.

Prompted by (but won't fix anything in):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.100  28-Jul-2024  riastradh wg(4): Fix session destruction.

Schedule destruction as soon as the session is created, to ensure key
erasure within 2*reject-after-time seconds. Previously, we would
schedule destruction of the previous session 1 second after the next
one has been established. Combined with a failure to update the
state machine on keepalive packets, this led to temporary deadlock
scenarios.

To keep it simple, there's just one callout which runs every
reject-after-time seconds and erases keys in sessions older than
reject-after-time, so if a session is established the moment after it
runs, the keys might not be erased until (2-eps)*reject-after-time
seconds.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.99  28-Jul-2024  riastradh wg(4): Mark wgp_pending volatile to reflect its usage.

Prompted by (but won't fix any part of):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.98  28-Jul-2024  riastradh wg(4): Expand cookie secret to 32 bytes.

This is only relevant for denial of service mitigation, so it's not
that big a deal, and the spec doesn't say anything about the size,
but let's make it the standard key size.

PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not
32-byte cookie secret
 1.97  28-Jul-2024  riastradh wg(4): Omit needless pserialize_perform on transition to DESTROYING.

A session can still be used when it is in the DESTROYING state, so
there's no need to wait for users to drain here -- that's the whole
point of a separate DESTROYING state.

It is only the transition from DESTROYING back to UNKNOWN, after the
session has been unpublished so no new users can begin, that requires
waiting for all users to drain, and we already do that in
wg_destroy_session.

Prompted by (but won't fix anything in, because this is just a
performance optimization):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.96  28-Jul-2024  riastradh wg(4): Use callout_halt, not callout_stop.

It's possible that callout_stop might work here, but let's simplify
reasoning about it -- the timers in question only take the peer intr
lock, so it's safe to wait for them while holding the peer lock in
the handshake worker thread.

We may have to undo the task bit but that will take a bit more
analysis to determine.

Prompted by (but probably won't fix anything in):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.95  28-Jul-2024  riastradh wg(4): Fix logic to ensure session initiation is underway.

Previously, wg_task_send_init_message would call
wg_send_handshake_msg_init if either:

(a) the stable session is UNKNOWN, meaning a session has not yet been
established, either by us or by the peer (but it could be in
progress); or

(b) the stable session is not UNKNOWN but the unstable session is
_not_ INIT_ACTIVE, meaning there is an established session and we
are not currently initiating a new session.

If wg_output (or wgintr) found no established session while there was
already a session being initiated, we may only enter
wg_task_send_init_message after the session is already established,
and trigger spurious reinitiation.

Instead, create a separate flag to indicate whether it is mandatory
to rekey because limits have passed. Then create a session only if:

(a) the stable session is not ESTABLISHED, or
(b) the mandatory rekey flag is not set,

and clear the mandatory rekey flag.

While here, arrange to do rekey-after-time on tx, not on callout. If
there's no data to tx, we shouldn't reinitiate a session -- we should
stay quiet on the network.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.94  28-Jul-2024  riastradh wg(4): Rework some details of internal session state machine.

This way:

- There is a clear transition between when a session is being set up,
and when it is exposed to the data rx path (wg_handle_msg_data):
atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
ESTABLISHED.

(The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
data rx path, so that's just atomic_store_relaxed. Similarly the
transition to DESTROYING.)

- There is a clear transition between when a session is being set up,
and when it is exposed to the data tx path (wg_output):
atomic_store_release to set wgp->wgp_session_stable to it.

- Every path that reinitializes a session must go through
wg_destroy_session via wg_put_index_session first. This avoids
races between session reuse and the data rx/tx paths.

- Add a log message at the time of every state transition.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.93  27-Jul-2024  christos Limit the size of the packet, and print ... if it is bigger. (from kre@)
 1.92  26-Jul-2024  riastradh wg(4): Allow modunload before any interface creation.

The workqueue and pktq are both lazily created, for annoying module
initialization order reasons, so they may not have been created by
the time of modunload.

PR kern/58470
 1.91  25-Jul-2024  christos consistently use printf instead of aprint_debug and print the tkeys with
the packet.
 1.90  25-Jul-2024  christos Add more debugging from Taylor
 1.89  25-Jul-2024  kre Make the debug (WG_DEBUG) func gethexdump() always return a valid
pointer, never NULL, so it doesn't need to be tested before being
printed, which was being done sometimes, but not always.
 1.88  25-Jul-2024  kre There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs
WG_DEBUG defined as well, if set.
 1.87  25-Jul-2024  kre Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu
to print size_t values.
 1.86  25-Jul-2024  christos use hexdump...
 1.85  25-Jul-2024  christos fix size limit calculation in dump and NULL checks
 1.84  24-Jul-2024  christos Add packet dump debugging
 1.83  24-Jul-2024  kre While the previous change fixed the broken build, it wasn't the best
way, as defining any of the WG_DEBUG_XXX symbols then effectively
defined all of them - making them as seperate entities, pointless.

So, rearrange the way things are done a little to avoid doing that.
 1.82  24-Jul-2024  kre If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a
stray rump Makefile...) then we now must have WG_DEBUG also defined, so
if it wasn't, make it so.
 1.81  24-Jul-2024  christos Add more debugging in packet validation
 1.80  24-Jul-2024  christos Add a wg_debug variable to split between debug/trace/dump messages
 1.79  05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.78  10-Mar-2024  riastradh wg(4): Bind to CPU in wg_handle_packet.

Required by use of psref there.

Assert we're bound up front so we catch mistakes early, rather than
later on if we get unlucky in preemption and scheduling.

PR bin/58021
 1.77  01-Aug-2023  mrg branches: 1.77.2;
fix simple mis-matched function prototype and definitions.

most of these are like, eg

void foo(int[2]);

with either of these

void foo(int*) { ... }
void foo(int[]) { ... }

in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.

found by GCC 12.
 1.76  11-Apr-2023  jakllsch Give scope and additional details to wg(4) diagnostic messages.
 1.75  05-Apr-2023  andvar s/termintaed/terminated/ in comment.
 1.74  05-Jan-2023  christos centralize the kauth ugliness.
 1.73  05-Jan-2023  jakllsch wg(4): Allow non-root to retrieve information other than the private
key and the peer preshared key.

Add kauth(9) enums for wg(4) and add use them in suser secmodel.

Refines fix for PR 57161.
 1.72  05-Jan-2023  jakllsch Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4).

Addresses PR 57161.
 1.71  04-Nov-2022  ozaki-r branches: 1.71.2;
inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.70  28-Oct-2022  ozaki-r Adjust pf, wg, dccp and sctp for struct inpcb integration
 1.69  25-Mar-2022  hannken Prevent memory corruption from wg_send_handshake_msg_init() on
LP64 machines with "MSIZE == 256", sparc64 for example.

wg_send_handshake_msg_init() tries to put 148 bytes into a buffer
of 144 bytes and overwrites 4 bytes following the mbuf. Check
for "sizeof() > MHLEN" and use a cluster in this case.

With help from Taylor R Campbell <riastradh@>
 1.68  16-Jan-2022  riastradh wg(4): Limit the size of ifdrv requests.

Avoids potential integer overflow or kernel memory exhaustion.

Reported by Thomas Leroy a while back.
 1.67  31-Dec-2021  riastradh sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.66  31-Dec-2021  riastradh sys: Use if_stop wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.65  17-Aug-2021  christos Some signnes, casts, and constant sizes.
Add module dependencies.
 1.64  16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.63  29-Apr-2021  riastradh Sprinkle __noinline to reduce gigantic stack frames in ALL kernels.

In principle this might just push a real problem around, but this is
unlikely to be a real problem because:

1. The large stack frames are really only in the setup state machine
message handlers, which run at the top loop of a thread with a
shallow stack anyway.

2. If these are inlined, gcc might create multiple nonoverlapping
stack buffers, whereas if not inlined, the stack frames from
consecutive or alternative procedure calls would overlap anyway.

(I haven't investigated exactly what's going on leading to ~5 KB-byte
stack frames, but this shuts gcc up, at least, and the hypotheses
sound plausible to me!)
 1.62  11-Nov-2020  riastradh branches: 1.62.4;
wg: Sprinkle #ifdef INET6. Avoid unconditional use of ip6 structs.

Fixes no-INET6 build.

Based on patch from Brad Spencer:

https://mail-index.NetBSD.org/current-users/2020/11/11/msg039883.html
 1.61  15-Oct-2020  roy branches: 1.61.2;
wg: with no peers, the link status is DOWN, otherwise UP

This mirrors the recent changes to gif(4) where the link is UP when a
tunnel is set, otherwise DOWN.
 1.60  14-Sep-2020  riastradh wg: Add altq hooks.

While here, remove the IFQ_CLASSIFY bottleneck (takes the ifq lock,
so it would serialize all transmission to all peers on a single wg(4)
interface).

altq can be disabled at compile-time or at run-time; even if included
at comple-time the run-time impact should be negligible if disabled.
 1.59  13-Sep-2020  riastradh wg: Fix detach logic.

Not tested but this should be less of a rake to step on if anyone
made an unloadable wg module.
 1.58  13-Sep-2020  riastradh wg: Use RUN_ONCE to defer workqueue_create until after configure.

Should really fix workqueue(9) so workqueue_create can be done before
CPUs have been detected in configure, but this will serve as a stop-
gap measure.
 1.57  13-Sep-2020  riastradh wg: Add missing kpreempt_disable/enable around pktq_enqueue.
 1.56  08-Sep-2020  riastradh wg: Drop wgp_lock while waiting for endpoint psref to drain.

- This is safe because wgp_endpoint_changing locks out any attempts
to change the endpoint until the draining is complete.

- This is necessary to avoid a deadlock where the handshake thread
holds a psref and awaits mutex_enter(wgp->wgp_lock).

XXX The same deadlock may occur in wg_destroy_session. Not clear
that it's safe to just release wgp_lock there; may need to create a
new session state, say WGS_STATE_DRAINING, while we wait for
psref_target_destroy. But this needs a little more thought; a new
state may not be necessary, and would be nice to avoid if not
necessary.
 1.55  07-Sep-2020  riastradh wg: Use threadpool(9) and workqueue(9) for asynchronous tasks.

- Using threadpool(9) job per interface to receive incoming handshake
messages gives the same concurrency for active interfaces but
doesn't waste kthreads for inactive ones.

=> Can't really do this with a global workqueue(9) because there's
no bound on the amount of time wg_receive_packets() might run
for; we really need separate threads or threadpool jobs in order
to avoid having one interface starve all the others.

- Using a global workqueue(9) for asynchronous peer tasks avoids
creating unnecessary kthreads.

=> Each task does a more or less bounded amount of work, so it's OK
to share a global workqueue -- there's no advantage to adding
concurrency for what is almost certainly going to be CPU-bound
asymmetric crypto.

=> This way we don't need a thread per peer or iteration over a
list of all peers, so the task mechanism should no longer be a
bottleneck to scaling to thousands of peers.

XXX This doesn't distribute the load across CPUs -- it keeps it on
the same CPU where the packet came in. Should consider doing
something to balance the load -- maybe note if the current CPU is
loaded, and if so, sort CPUs by queue length or some other measure of
load and pick the least loaded one or something.
 1.54  07-Sep-2020  riastradh wg: Use a global pktqueue rather than a per-peer pcq.

- Improves scalability -- won't hit limit on softints no matter how
many peers there are.
- Improves parallelism -- softint was kernel-locked to serialize
access to the pcq.
- Requires per-peer queue on handshake init to avoid dropping first
packet.
. Per-peer queue is currently a single packet -- should serve well
enough for pings, dns queries, tcp connections, &c.
 1.53  07-Sep-2020  riastradh wg: Fix debug output now that the priority is mixed into it.
 1.52  07-Sep-2020  riastradh wg: Fix non-DIAGNOSTIC build.
 1.51  31-Aug-2020  riastradh wg: Avoid memory leak if socreate fails.
 1.50  31-Aug-2020  riastradh wg: Make it build with WG_DEBUG on 32-bit platforms.
 1.49  31-Aug-2020  riastradh wg: Simplify locking.

Summary: Access to a stable established session is still allowed via
psref; all other access to peer and session state is now serialized
by struct wg_peer::wgp_lock, with no dancing around a per-session
lock. This way, the handshake paths are locked, while the data
transmission paths are pserialized.

- Eliminate struct wg_session::wgs_lock.

- Eliminate wg_get_unstable_session -- access to the unstable session
is allowed only with struct wgp_peer::wgp_lock held.

- Push INIT_PASSIVE->ESTABLISHED transition down into a thread task.

- Push rekey down into a thread task.

- Allocate session indices only on transition from UNKNOWN and free
them only on transition back to UNKNOWN.

- Be a little more explicit about allowed state transitions, and
reject some nonsensical ones.

- Sprinkle assertions and comments.

- Reduce atomic r/m/w swap operations that can just as well be
store-release.
 1.48  31-Aug-2020  riastradh wg: M_NOWAIT -> M_DONTWAIT

These happen to be aliases, but M_NOWAIT is part of the legacy malloc
API whereas M_DONTWAIT is part of the mbuf API.
 1.47  31-Aug-2020  riastradh wg: wg_sockaddr audit.

- Ensure all access to struct wg_peer::wgp_endpoint happens while
holding a psref.

- Simplify internalize/externalize logic and be more careful about
verifying it before printing anything.
 1.46  31-Aug-2020  riastradh wg: On INIT, do DH and decrypt timestamp before locking session.

This narrows the window when the session is unlocked. Really there
should be no such window, but we'll finish getting rid of it later.
 1.45  31-Aug-2020  riastradh wg: Verify or send cookie challenge before looking up session.

This step doesn't depend on the session, so let's avoid touching the
session state until we've passed it.
 1.44  31-Aug-2020  riastradh wg: Verify mac1 as the first step on INIT and RESP messages.

This avoids the expensive DH computation before the sender has proven
knowledge of our public key.
 1.43  31-Aug-2020  riastradh wg: Omit needless variable.
 1.42  31-Aug-2020  riastradh wg: Switch to callout_stop for session destructor timer.

Can't release the lock here, and can't sleep waiting for the callout
while we hold it without risking deadlock. But not waiting is fine;
after we transition out of WGS_STATE_UNKNOWN the timer has no effect.
 1.41  31-Aug-2020  riastradh wg: Fix indentation. No functional change.
 1.40  31-Aug-2020  riastradh wg: Just call callout_halt directly.

No functional change, just makes it easier to read where callout_halt
happens.
 1.39  31-Aug-2020  riastradh wg: Fix byte order on wire.

Give this a chance to work on big-endian systems.
 1.38  31-Aug-2020  riastradh wg: mbuf m_freem audit.

1. wg_handle_msg_data frees m but the other wg_handle_msg_* just take
a pointer to the mbuf content and not m itself, so free m in those
cases.

2. Can't trivially prove that the pcq is empty by the time
wg_destroy_peer runs pcq_destroy, so let's explicitly purge it
just in case.

3. If wg_send_udp isn't doing udp_send or udp6_output, it still has
to free m in the !INET6 error branch for IPv6 packets.

4. After rumpuser_wg_send_peer or rumpuser_wg_send_user, we still
need to free the mbuf.
 1.37  31-Aug-2020  riastradh wg: Use thmap(9) for peer and session lookup.

Make sure we also don't trip over our own shoelaces by choosing the
same session index twice.
 1.36  31-Aug-2020  riastradh wg: XAEAD doesn't use a counter, so don't pass one.
 1.35  31-Aug-2020  riastradh wg: Count down wg_npeers in wg_destroy_all_peers too.

Doesn't actually make a difference -- wg_destroy_all_peers is only
used when we're destroying the wg instance altogether -- but let's
not leave rakes to step on.
 1.34  31-Aug-2020  riastradh wg: Note lock order.
 1.33  31-Aug-2020  riastradh wg: Remove IFF_POINTOPOINT.

Unclear why this was set; setting it seems to have required a kludge
in netinet/in.c that broke ipsec tunnels. Clearing it makes wg work
again after that kludge was reverted.
 1.32  28-Aug-2020  riastradh wg: Sort includes.
 1.31  27-Aug-2020  tih Summary: let wg interfaces carry multicast traffic

Once a wg interface is up and running, it is useful to be able to run
a routing protocol over it. Marking the interface multicast capable
enables this. (One must also use the wgconfig --allowed-ips option to
explicitly permit the group one needs, e.g. 224.0.0.5/32 for OSPF.)
 1.30  27-Aug-2020  riastradh wg: Assert MCLBYTES is enough for requested length in wg_get_mbuf.
 1.29  27-Aug-2020  riastradh wg: Make sure all paths into wg_handle_msg_data guarantee enough m_len.

Earlier commit moved the m_pullup into wg_validate_msg_header, but
wg_overudp_cb doesn't go through that.
 1.28  27-Aug-2020  riastradh wg: Drop invalid message types on the floor faster.

Don't even let them reach the thread -- drop them in softint.
 1.27  27-Aug-2020  riastradh wg: KASSERT m_len before mtod.

XXX We should really make mtod do this automagically, and use
something else for mtod(m, void *).
 1.26  27-Aug-2020  riastradh wg: Use m_pullup to make message header contiguous before processing.
 1.25  27-Aug-2020  riastradh wg: Check mbuf chain length before m_copydata.
 1.24  26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.23  23-Aug-2020  riastradh Initialize peers early on for error branch.
 1.22  21-Aug-2020  riastradh Use lock rather than 64-bit atomics for platforms without the latter.
 1.21  21-Aug-2020  riastradh Fix sysctl types.

- CTLTYPE_QUAD, not CTLTYPE_LONG, for uint64_t
- use unsigned rather than time_t -- these are all short durations
- clamp timeouts to be safe for conversion to int ticks in callout

Should fix 32-bit builds.
 1.20  21-Aug-2020  riastradh Ifdef out fast path that relies on atomic 64-bit load/store.

(Really this sliding window business could probably be done with
32-bit sequence numbers and careful detection of wraparound, but
that's a little more effort to work out -- let's just unbreak the
builds for now.)
 1.19  20-Aug-2020  riastradh Mark KASSERT-only variable as __diagused.
 1.18  20-Aug-2020  riastradh Avoid callout_halt under lock.

- We could pass the lock in, except we hold another lock too.

- We could halt before taking the other lock, but it's not safe to
sleep after getting the session pointer before taking its lock.

- We could halt before getting the session pointer, but then there's
no point in doing it under the lock.

So just halt a little earlier instead.
 1.17  20-Aug-2020  riastradh Sprinkle const.
 1.16  20-Aug-2020  riastradh Use container_of rather than casts via void *.
 1.15  20-Aug-2020  riastradh Use be32enc, rather than possibly unaligned uint32_t cast and htonl.
 1.14  20-Aug-2020  riastradh KNF
 1.13  20-Aug-2020  riastradh Use consttime_memequal, not memcmp, to compare secrets for equality.
 1.12  20-Aug-2020  riastradh Take advantage of prop_dictionary_util(3).
 1.11  20-Aug-2020  riastradh Split up wg_process_peer_tasks into bite-size functions.
 1.10  20-Aug-2020  riastradh Fix race in wg_worker kthread destruction.

Also allow the thread to migrate between CPUs -- just not while we're
in the middle of processing and holding onto things with psrefs.
 1.9  20-Aug-2020  riastradh Update for proplib API changes.
 1.8  20-Aug-2020  riastradh Use SYSCTL_SETUP for net.wireguard subtree.
 1.7  20-Aug-2020  riastradh Fix in-kernel debug build.
 1.6  20-Aug-2020  riastradh Implement sliding window for wireguard replay detection.
 1.5  20-Aug-2020  riastradh Don't falsely assert cpu_softintr_p().

Will fail in the following stack trace:

wg_worker (kthread)
wg_receive_packets
wg_handle_packet
wg_handle_msg_data
KASSERT(cpu_softintr_p())

Instead, use kpreempt_disable/enable around softint_schedule.

XXX Not clear that softint is the right place to do this!
 1.4  20-Aug-2020  riastradh Convert wg(4) to if_stat.
 1.3  20-Aug-2020  riastradh Use cprng_strong, not cprng_fast, for ephemeral key.
 1.2  20-Aug-2020  riastradh [ozaki-r] Fix bugs found by maxv's audits
 1.1  20-Aug-2020  riastradh [ozaki-r] Add wg files
 1.61.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.62.4.2  17-Jun-2021  thorpej Sync w/ HEAD.
 1.62.4.1  13-May-2021  thorpej Sync with HEAD.
 1.71.2.5  15-Dec-2024  martin Pull up following revision(s) (requested by alnsn in ticket #1022):

sys/net/if_wg.c: revision 1.133

wg(4): Avoid spurious kassert for harmless race in session retry.

If we have already transitioned away from INIT_ACTIVE by the time the
retry timer has fired, the handshake start time may have been zeroed,
but that's harmless. So don't kassert about it until after we've
verified we're still in INIT_ACTIVE state.

PR kern/58859: KASSERT in wg_task_retry_handshake
 1.71.2.4  09-Oct-2024  martin Pull up following revision(s) (requested by riastradh in ticket #934):

sys/net/if_wg.c: revision 1.117
sys/net/if_wg.c: revision 1.118
sys/net/if_wg.c: revision 1.119
sys/net/if_wg.c: revision 1.80
sys/net/if_wg.c: revision 1.81
tests/net/if_wg/t_misc.sh: revision 1.13
sys/net/if_wg.c: revision 1.82
sys/net/if_wg.c: revision 1.130
tests/net/if_wg/t_misc.sh: revision 1.14
sys/net/if_wg.c: revision 1.83
sys/net/if_wg.c: revision 1.131
tests/net/if_wg/t_misc.sh: revision 1.15
sys/net/if_wg.c: revision 1.84
sys/net/if_wg.c: revision 1.132
tests/net/if_wg/t_misc.sh: revision 1.16
sys/net/if_wg.c: revision 1.85
sys/net/if_wg.c: revision 1.86
tests/net/if_wg/t_basic.sh: revision 1.5
sys/net/if_wg.c: revision 1.87
tests/net/if_wg/t_basic.sh: revision 1.6
sys/net/if_wg.c: revision 1.88
sys/net/if_wg.c: revision 1.89
sys/net/if_wg.c: revision 1.100
sys/net/if_wg.c: revision 1.101
sys/net/if_wg.c: revision 1.102
sys/net/if_wg.c: revision 1.103
sys/net/if_wg.c: revision 1.104
sys/net/if_wg.c: revision 1.105
sys/net/if_wg.c: revision 1.106
sys/net/if_wg.c: revision 1.107
sys/net/if_wg.c: revision 1.108
sys/net/if_wg.c: revision 1.109
sys/net/if_wg.c: revision 1.120
sys/net/if_wg.c: revision 1.121
sys/net/if_wg.c: revision 1.122
sys/net/if_wg.c: revision 1.123
sys/net/if_wg.c: revision 1.124
sys/net/if_wg.c: revision 1.75
sys/net/if_wg.c: revision 1.77
sys/net/if_wg.c: revision 1.125
sys/net/if_wg.c: revision 1.126
sys/net/if_wg.c: revision 1.79
sys/net/if_wg.c: revision 1.127
sys/net/if_wg.c: revision 1.128
sys/net/if_wg.c: revision 1.129
sys/net/if_wg.c: revision 1.90
sys/net/if_wg.c: revision 1.91
sys/net/if_wg.c: revision 1.92
sys/net/if_wg.c: revision 1.93
sys/net/if_wg.c: revision 1.94
sys/net/if_wg.c: revision 1.95
sys/net/if_wg.c: revision 1.96
sys/net/if_wg.c: revision 1.97
sys/net/if_wg.c: revision 1.98
sys/net/if_wg.c: revision 1.99
sys/net/if_wg.c: revision 1.110
sys/net/if_wg.c: revision 1.111
sys/net/if_wg.c: revision 1.112
sys/net/if_wg.c: revision 1.113
sys/net/if_wg.c: revision 1.114
sys/net/if_wg.c: revision 1.115
sys/net/if_wg.c: revision 1.116

fix simple mis-matched function prototype and definitions.
most of these are like, eg
void foo(int[2]);
with either of these
void foo(int*) { ... }
void foo(int[]) { ... }
in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.
found by GCC 12.

sys: Drop redundant NULL check before m_freem(9)
m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c
Compile-tested on amd64/ALL.
Suggested by knakahara@

Add a wg_debug variable to split between debug/trace/dump messages

Add more debugging in packet validation

If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a
stray rump Makefile...) then we now must have WG_DEBUG also defined, so
if it wasn't, make it so.

While the previous change fixed the broken build, it wasn't the best
way, as defining any of the WG_DEBUG_XXX symbols then effectively
defined all of them - making them as seperate entities, pointless.

So, rearrange the way things are done a little to avoid doing that.

Add packet dump debugging
fix size limit calculation in dump and NULL checks
use hexdump...

Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu
to print size_t values.

There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs
WG_DEBUG defined as well, if set.

Make the debug (WG_DEBUG) func gethexdump() always return a valid
pointer, never NULL, so it doesn't need to be tested before being
printed, which was being done sometimes, but not always.

Add more debugging from Taylor

wg(4): Allow modunload before any interface creation.

The workqueue and pktq are both lazily created, for annoying module
initialization order reasons, so they may not have been created by
the time of modunload.
PR kern/58470

Limit the size of the packet, and print ... if it is bigger. (from kre@)
wg(4): Rework some details of internal session state machine.

This way:
- There is a clear transition between when a session is being set up,
and when it is exposed to the data rx path (wg_handle_msg_data):
atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
ESTABLISHED.
(The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
data rx path, so that's just atomic_store_relaxed. Similarly the
transition to DESTROYING.)
- There is a clear transition between when a session is being set up,
and when it is exposed to the data tx path (wg_output):
atomic_store_release to set wgp->wgp_session_stable to it.
- Every path that reinitializes a session must go through
wg_destroy_session via wg_put_index_session first. This avoids
races between session reuse and the data rx/tx paths.
- Add a log message at the time of every state transition.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix logic to ensure session initiation is underway.

Previously, wg_task_send_init_message would call
wg_send_handshake_msg_init if either:
(a) the stable session is UNKNOWN, meaning a session has not yet been
established, either by us or by the peer (but it could be in
progress); or
(b) the stable session is not UNKNOWN but the unstable session is
_not_ INIT_ACTIVE, meaning there is an established session and we
are not currently initiating a new session.

If wg_output (or wgintr) found no established session while there was
already a session being initiated, we may only enter
wg_task_send_init_message after the session is already established,
and trigger spurious reinitiation.

Instead, create a separate flag to indicate whether it is mandatory
to rekey because limits have passed. Then create a session only if:
(a) the stable session is not ESTABLISHED, or
(b) the mandatory rekey flag is not set,
and clear the mandatory rekey flag.

While here, arrange to do rekey-after-time on tx, not on callout. If
there's no data to tx, we shouldn't reinitiate a session -- we should
stay quiet on the network.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

PR kern/56252: wg(4) state machine has race conditions

PR kern/58463: if_wg does not work when idle.

wg(4): Use callout_halt, not callout_stop.
It's possible that callout_stop might work here, but let's simplify
reasoning about it -- the timers in question only take the peer intr
lock, so it's safe to wait for them while holding the peer lock in
the handshake worker thread.

We may have to undo the task bit but that will take a bit more
analysis to determine.
Prompted by (but probably won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless pserialize_perform on transition to DESTROYING.

A session can still be used when it is in the DESTROYING state, so
there's no need to wait for users to drain here -- that's the whole
point of a separate DESTROYING state.

It is only the transition from DESTROYING back to UNKNOWN, after the
session has been unpublished so no new users can begin, that requires
waiting for all users to drain, and we already do that in
wg_destroy_session.

Prompted by (but won't fix anything in, because this is just a
performance optimization):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Expand cookie secret to 32 bytes.
This is only relevant for denial of service mitigation, so it's not
that big a deal, and the spec doesn't say anything about the size,
but let's make it the standard key size.

PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not
32-byte cookie secret

wg(4): Mark wgp_pending volatile to reflect its usage.
Prompted by (but won't fix any part of):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix session destruction.
Schedule destruction as soon as the session is created, to ensure key
erasure within 2*reject-after-time seconds. Previously, we would
schedule destruction of the previous session 1 second after the next
one has been established. Combined with a failure to update the
state machine on keepalive packets, this led to temporary deadlock
scenarios.

To keep it simple, there's just one callout which runs every
reject-after-time seconds and erases keys in sessions older than
reject-after-time, so if a session is established the moment after it
runs, the keys might not be erased until (2-eps)*reject-after-time
seconds.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Reject rx on sessions older than reject-after-time sec.
Prompted by (but won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): On rx of valid ciphertext, make sure to update state machine.

Previously, we also required the plaintext to be a plausible-looking
IP packet before updating the state machine.

But keepalive packets are empty -- and if the peer initiated the
session to rekey after last tx but had no more data to tx, it will
send a keepalive to finish session initiation.
If we didn't update the state machine in that case, we would stay in
INIT_PASSIVE state unable to tx on the session, which would make
things hang.

So make sure to always update the state machine once we have accepted
a packet as genuine, even if it's genuine garbage on the inside.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make sure to update endpoint on keepalive packets too.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout.

Most of the timers in wg(4) have only 1sec resolution, which might be
rounded in either direction, so make sure there's a 2sec buffer on
either side of the event we care about (the point at which wg(4)
decides to stop retrying handshake).

Won't fix any bugs, but might make the tests slightly less flaky.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions

tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages.

Helpful for following the test log when things go wrong.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
wg(4): Tests should pass now.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Use 32-bit for times handled in rx/tx paths.

The rx and tx paths require unlocked access to wgs_time_established
(to decide whether it's time to rekey) and wgs_time_last_data_sent
(to decide whether we need to reply to incoming data with a keepalive
packet), so do it with atomic_load/store_*.

On 32-bit platforms, we may not be able to do that on time_t.

However, since sessions only last for a few minutes before
reject-after-time kicks in and they are erased, 32 bits is plenty to
record the durations that we need to record here, so this shouldn't
introduce any new bugs even on hosts that exceed 136 years of uptime.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make time_uptime32 work in netbsd<=10.

This is the low 32 bits of time_uptime.
Will simplify pullups to 10 for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix quotation in comment.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Process all altq'd packets when deleting peer.

Can't just drop them because we can only go through all packets on an
interface at a time, for all peers -- so we'd either have to drop all
peers' packets, or requeue the packets for other peers. Probably not
worth the trouble, so let's just wait for all the packets currently
queued up to go through first.

This requires reordering teardown so that we wg_destroy_all_peers,
and thus wg_purge_pending_packets, _before_ we wg_if_detach, because
wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses.

PR kern/58477: experimental wg(4) ALTQ support is probably buggy

wg(4): Tidy up error branches.
No functional change intended, except to add some log messages in
failure cases.
Cleanup after:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Be more consistent about #ifdef INET/INET6.
PR kern/58478: experimental wg(4) probably doesn't build with
INET6-only

wg(4): Parenthesize macro expansions properly.

PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Delete temporary hacks to dump keys and packets.
No longer useful for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Explain why gethexdump/puthexdump is there, and tidy.
This way I will not be tempted to replace it by in-line calls to
libkern hexdump.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Put force_rekey state in the session, not the peer.
That way, there is a time when one thread has exclusive access to the
state, in wg_destroy_session under the peer lock, when we can clear
the state without racing against the data tx path.
This will work more reliably than the atomic_swap_uint I used before.
Noted by kre@.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle static on fixed-size array parameters.

Let's make the static size declarations useful.
No functional change intended.

wg(4): Queue pending packet in FIFO order, not LIFO order.

Sometimes the session takes a seconds to establish, for whatever
reason. It is better if the pending packet, which we queue up to
send as soon as we get the responder's handshake response, is the
most recent packet, rather than the first packet.

That way, we don't wind up with a weird multi-second-delayed ping,
followed by a bunch of dropped, followed by normal ping timings, or
wind up sending the first TCP SYN instead of the most recent, or what
have you. Senders need to be prepared to retransmit anyway if
packets are dropped.

PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
first handshake
wg(4): Sprinkle comments into wg_swap_sessions.
No functional change intended.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): No need for atomic access to wgs_time_established in tx/rx.

This is stable while the session is visible to the tx/rx paths -- it
is initialized before the session is exposed to tx/rx, and doesn't
change until the session is no longer used by any tx/rx path and has
been recycled.

When I sprinkled atomic access to wgs_time_established in if_wg.c
rev. 1.104, it was a vestige of an uncommitted draft that did the
transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
an attempt to enable prompter tx on the new session as soon as it is
established. This turned out to be unnecessary, so I reverted most
of it, but forgot that wgs_time_established no longer needed atomic
treatment.

We could go back to using time_t and time_uptime, now that there's no
need to do atomic loads and stores on these quantities. But there's
no point in 64-bit arithmetic when the time differences are all
guaranteed bounded by a few minutes, so keeping it 32-bit is probably
a slight performance improvement on 32-bit systems.
(In contrast, wgs_time_last_data_sent is both written and read in the
tx path, which may run in parallel on multiple CPUs, so it still
requires the atomic treatment.)
Tidying up for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix memory ordering in detach.
PR kern/58510: experimental wg(4) lacks memory ordering between
wg_count_dec and module unload

wg(4): Fix typo in comment recently added.
Comment added in the service of:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless atomic_load.
wgs_local_index is only ever written to while only one thread has
access to it and it is not in the thmap -- before it is published in
wg_get_session_index, and after it is unpublished in
wg_destroy_session. So no need for atomic_load -- it is stable if we
observe it in thmap_get result.
(Of course this is only for an assertion, which if tripped obviously
indicates a violation of our assumptions. But if that happens, well,
in the worst case we'll see a weird assertion message claiming that
the index is not equal to itself, which from which we can conclude
there must have been a concurrent update, which is good enough to
help diagnose that problem without any atomic_load.)

Tidying some of the changes for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle comments on internal sliding window API.
Post-fix tidying for:
PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Deduplicate session establishment actions.
The actions to
(a) record the last handshake time,
(b) clear some handshake state,
(c) transmit first data if queued, or (if initiator) keepalive, and
(d) begin destroying the old session,
were formerly duplicated between wg_handle_msg_resp (for when we're
the initiator) and wg_task_establish_session (for when we're the
responder).

Instead, let's factor this out into wg_swap_session so there's only
one copy of the logic.
This requires moving wg_update_endpoint_if_necessary a little earlier
in wg_handle_msg_resp -- which should be done anyway so that the
endpoint is updated _before_ the session is published for the data tx
path to use.

Other than moving wg_update_endpoint_if_necessary a little earlier,
no functional change intended.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Read wgs_state atomically in wg_get_stable_session.
As noted in the comment above, it may concurrently transition from
ESTABLISHED to DESTROYING.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Force rekey on tx if session is older than reject-after-time.
One more corner case for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Add missing barriers around wgp_pending access.
PR kern/58520: experimental wg(4) lacks barriers around access to
packet pending initiation
wg(4): Trigger session initiation in wgintr, not in wg_output.

We have to look up the session in wgintr anyway, for
wg_send_data_msg. By triggering session initiation in wgintr instead
of wg_output, we can skip the stable session lookup and reference in
wg_output -- simpler that way.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Queue packet for post-handshake retransmit if limits are hit.
PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
wg(4): When a session is established, send first packet directly.

Like we would do with the keepalive packet, if we had to send that
instead -- no need to defer it to the pktq. Keep it simple.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle volatile on variables requiring atomic access.
No functional change intended, since the relevant access is always
done with atomic_* when it might race with concurrent access -- and
really this should be _Atomic or something. But for now our
atomic_ops(9) API is still spelled with volatile, so we'll use that.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make a rule for who wins when both peers send INIT at once.
The rule is that the peer with the numerically smaller public key
hash, in little-endian, takes priority iff the low order bit of
H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)
is 0, and the peer with the lexicographically larger public key takes
priority iff the low-order bit is 1.

Another case of:
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

This one is, as far as I can tell, simply a deadlock in the protocol
of the whitepaper -- until both sides give up on the handshake and
one of them (but not both) later decides to try sending data again.
(But not related to our t_misc:wg_rekey test, as far as I can tell,
and I haven't put enough thought into how to reliably trigger this
race to write a new automatic test for it.)
wg(4): Add Internet Archive links for the versions cited.
No functional change.

tests/net/if_wg/t_misc: Add some diagnostics.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

wg(4): Test truncated UDP input from the network.
This triggers double-free in the IPv6 udp6_input path -- but,
confusingly, not the IPv4 udp_input path, even though the overudp_cb
interface ought to be the same:
/* udp_input -- no further use of m if return is -1 */
if ((n = udp4_realinput(&src, &dst, &m, iphlen)) == -1) {
UDP_STATINC(UDP_STAT_HDROPS);
return;
}
/* udp6_input -- m_freem if return is not 0 */
if (udp6_realinput(AF_INET6, &src, &dst, &m, off) == 0) {
...
}
bad:
m_freem(m);
return IPPROTO_DONE;

The subroutines udp4_realinput and udp6_realinput pass through the
return value of overudp_cb in essentially the same way:
/* udp4_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sintosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;
/* udp6_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sin6tosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;

PR kern/58688: userland panic of kernel via wg(4)

wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.
PR kern/58688: userland panic of kernel via wg(4)
 1.71.2.3  11-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #628):

sys/net/if_wg.c: revision 1.78

wg(4): Bind to CPU in wg_handle_packet.

Required by use of psref there.
Assert we're bound up front so we catch mistakes early, rather than
later on if we get unlucky in preemption and scheduling.

PR bin/58021
 1.71.2.2  07-Jul-2023  martin Pull up following revision(s) (requested by jakllsch in ticket #228):

sys/net/if_wg.c: revision 1.76

Give scope and additional details to wg(4) diagnostic messages.
 1.71.2.1  13-Jan-2023  martin Pull up following revision(s) (requested by jakllsch in ticket #49):

sys/secmodel/suser/secmodel_suser.c: revision 1.57
sys/sys/kauth.h: revision 1.89
sys/net/if_wg.c: revision 1.72
sys/net/if_wg.c: revision 1.73
sys/net/if_wg.c: revision 1.74

Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4).
Addresses PR 57161.

wg(4): Allow non-root to retrieve information other than the private
key and the peer preshared key.

Add kauth(9) enums for wg(4) and add use them in suser secmodel.

Refines fix for PR 57161.

centralize the kauth ugliness.
 1.77.2.1  14-Nov-2023  thorpej branches: 1.77.2.1.2;
Update for the new location of altq_flags (not in if_snd directly).
 1.77.2.1.2.1  15-Nov-2023  thorpej wg_output(): Use ifq_classify_packet(), and let that function check
for ALTQ-enabled. Acquire KERNEL_LOCK before calling ALTQ_ENQUEUE().
XXX The ALTQ integration here is a mess.

RSS XML Feed