Home | History | Annotate | only in /src/tests/net
History log of /src/tests/net
RevisionDateAuthorComments
 1.5 13-Jul-2010  jmmv Get rid of static Atffiles and let bsd.test.mk generate them on the fly.
 1.4 30-Dec-2007  jmmv branches: 1.4.2;
Re-add the NetBSD CVS Id tag to the header. It just had to be quoted to
be accepted by the parser; i.e. no bug in the code :-)

Note to self: do not try to "fix" stuff the last minute before going to
sleep.
 1.3 29-Dec-2007  jmmv Back out the change to introduce the X-NetBSD-Id header entry. For some
reason the parser does not accept its contents... You know, one should
always test even trivial changes!

Will review this in more depth tomorrow to find the real root cause of the
problem and rule out a fix for ATF.
 1.2 26-Dec-2007  jmmv Add the NetBSD Id tag to the Atffiles. Issue raised by pooka@ a while ago.
 1.1 23-Dec-2007  jmmv Add regression tests for low-port allocation in connect and listen, which
was broken and fixed recently in:
http://mail-index.netbsd.org/source-changes/2007/12/16/0011.html

Test-case code provided by elad@.
 1.4.2.2 09-Jan-2008  matt sync with HEAD
 1.4.2.1 30-Dec-2007  matt file Atffile was added on branch matt-armv6 on 2008-01-09 01:59:28 +0000
 1.42 20-Aug-2024  ozaki-r tests: add tests for shmif

The test file is placed under tests/net, not tests/rump/rumpnet,
to leverage utility functions provided for tests in there.
 1.41 17-Nov-2022  ozaki-r branches: 1.41.2; 1.41.4;
tests: build and install added test files
 1.40 02-Nov-2022  ozaki-r tests: add tests for TCP with nc
 1.39 14-Jul-2021  ozaki-r tests: add tests for ALTQ CBQ
 1.38 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.37 29-Sep-2020  roy branches: 1.37.2;
vether(4): Add ATF tests based on the tap(4) tests.
 1.36 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.35 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.34 10-Jan-2018  knakahara add ipsec(4) interface ATF.
 1.33 27-May-2017  bouyer branches: 1.33.2;
merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.32 14-Apr-2017  ozaki-r Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.31 16-Feb-2017  knakahara add l2tp(4) basic test.
 1.30 26-Nov-2016  ozaki-r branches: 1.30.2;
Add basic tests for vlan(4)
 1.29 05-Sep-2016  ozaki-r Add very basic tests for tun devices
 1.28 15-Apr-2016  ozaki-r branches: 1.28.2;
Add a new test case for PPPoE using PAP

From s-yamaguchi@IIJ (with some tweaks by me)
 1.27 04-Mar-2016  ozaki-r Add tests for tap(4)
 1.26 05-Nov-2015  knakahara add basic if_gif tests to ATF.
 1.25 03-Aug-2015  ozaki-r Add tests for NDP
 1.24 29-Jul-2015  ozaki-r Add tests for ARP

Forgot to commit this. Should fix the build.
 1.23 22-Jun-2015  matt Don't build tests that depend on RUMP if BSD_MK_COMPAT_FILE is defined.
 1.22 26-May-2015  ozaki-r Run mcast tests on rump kernels

The tests on anita qemus failed due to that the host network environment
didn't meet the tests.

The change makes the tests independent from host environments
and the tests will pass on any environments including anita qemus.

Discussed on tech-kern and tech-net.
 1.21 20-May-2015  christos MKRUMP=no fixes (Robert Swindells)
 1.20 05-Jan-2015  christos Port the in_cksum test from regress.
 1.19 11-Oct-2014  christos add a multicast test (what to do with v6?)
 1.18 18-Sep-2014  ozaki-r Add net/if_bridge test
 1.17 30-Jun-2014  alnsn Add bpfjit kernel tests for loading from mbuf chain.
 1.16 18-Mar-2014  riastradh branches: 1.16.2;
Merge riastradh-drm2 to HEAD.
 1.15 19-Jul-2013  kefren Add a couple of basic IP/MPLS forwarding tests
 1.14 03-Jul-2013  nakayama branches: 1.14.2;
Enable tests which does not require rump if MKRUMP=no.
Pointed out by christos on source-changes-d.
 1.13 12-Sep-2012  martin ATF wrapping of the npf tests
 1.12 14-Aug-2012  alnsn branches: 1.12.2;
Build and install t_bpfilter.
 1.11 13-Aug-2012  christos add fdpass tests
 1.10 08-Aug-2012  christos Exclude tests that use rump
 1.9 08-Feb-2011  pooka branches: 1.9.4;
Time to start adding tests for the routing code to make that part
of the kernel more approachable.

Begin the task with an xfail test for PR kern/40455.
 1.8 11-Jan-2011  pooka branches: 1.8.2;
add test for PR kern/44369
 1.7 07-Nov-2010  pooka convert program in PR kern/44054 to an atf test case
 1.6 10-Aug-2010  pooka Add a most elementary carp test: it forks off two processes,
configures carp in each of them, and after verifying that the shared
address responds to ping it brutally kills the master like a proper
carnivore (none of that ifconfig down sissy vegan nonsense). Then
the test checks if the backup got its act together by pinging the
shared address and passes verdict.
 1.5 25-Jul-2010  pooka Add xfail test for kernel diagnostic panic described in PR kern/43664
 1.4 13-Jul-2010  jmmv Get rid of static Atffiles and let bsd.test.mk generate them on the fly.
 1.3 04-Jul-2010  pooka descend into icmp
 1.2 21-Apr-2010  pooka Check that bpf doesn't accept programs with divide-by-zero in them.
Example filter from Guy Harris via PR kern/43185.
 1.1 23-Dec-2007  jmmv branches: 1.1.2;
Add regression tests for low-port allocation in connect and listen, which
was broken and fixed recently in:
http://mail-index.netbsd.org/source-changes/2007/12/16/0011.html

Test-case code provided by elad@.
 1.1.2.2 09-Jan-2008  matt sync with HEAD
 1.1.2.1 23-Dec-2007  matt file Makefile was added on branch matt-armv6 on 2008-01-09 01:59:28 +0000
 1.8.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.9.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.4.1 30-Oct-2012  yamt sync with head
 1.12.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.14.2.1 23-Jul-2013  riastradh sync with HEAD
 1.16.2.1 10-Aug-2014  tls Rebase.
 1.28.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.28.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.28.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.30.2.2 21-Apr-2017  bouyer Sync with HEAD
 1.30.2.1 15-Jan-2017  bouyer Basic tests for our SocketCAN implementation (using rump)
 1.33.2.1 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.37.2.1 31-May-2021  cjep sync with head
 1.41.4.1 02-Aug-2025  perseant Sync with HEAD
 1.41.2.1 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #811):

tests/net/if_shmif/t_shmif.sh: revision 1.1
sbin/ifconfig/ifconfig.c: revision 1.251
sbin/ifconfig/ifconfig.8: revision 1.130
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.85
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.86
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.87
etc/mtree/NetBSD.dist.tests: revision 1.206
distrib/sets/lists/tests/mi: revision 1.1333
tests/net/if_shmif/Makefile: revision 1.1
tests/net/Makefile: revision 1.42

shmif: change behaviors about link states

- Change the link state to UP on ifconfig linkstr
- This behavior emulates physical devices
- Change the link state to UNKNOWN on ifconfig -linkstr just in case
- Reject sending/receiving packets if the link state is DOWN
- Permit to send/receive packets on UNKNOWN, which is required
to unbreak some ATF tests written in C

shmif: support media

It enables to link-down shmif by ifconfig media none and link-up
again by media auto.

ifconfig: show link state on -v

We could guess it through "media" or "status" output, however, we
sometimes want to know it directly for debugging or testing.

It is shown only if the -v option is specified.
tests: add tests for shmif

The test file is placed under tests/net, not tests/rump/rumpnet,
to leverage utility functions provided for tests in there.
shmem(4): Fix typo in comment: AFT -> ATF.

Also fix grammar (if I understood correctly what this meant: rump
servers written in C, rather than set up via shell scripts around
rump_server invoking ifconfig).

No functional change intended.
 1.2 23-Jan-2016  christos Define _KERNTYPES for things that need it.
 1.1 03-Nov-2010  christos add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.45 09-Aug-2024  rin tests/net_common.sh: Halt rump servers only if already started

Do not cat(1) missing ${_rump_server_socks}, in case where
a test should be skipped before starting any rump server.

NFC otherwise; if ${_rump_server_socks} is absent,
`rump_server_halt` can nothing anyway, unfortunately.

Thanks ozaki-r@ for discussion.
 1.44 02-Nov-2022  ozaki-r branches: 1.44.2; 1.44.4;
tests: enable start_nc_server to have extra options for nc
 1.43 25-Nov-2021  hannken Consistently use "drvctl -l qemufwcfg0" to check if
running under qemu in general.
 1.42 09-Jul-2021  yamaguchi added tests for IFF_PROMISC of vlan(4)
 1.41 01-Apr-2020  christos Enforce a standard path
 1.40 30-Mar-2020  christos Some interfaces (gif) don't have a mac address...
 1.39 20-Feb-2020  ozaki-r tests: abort if MAC address duplication found
 1.38 20-Feb-2020  ozaki-r tests: dump stats of an interface before destroying it
 1.37 26-Aug-2019  ozaki-r tests: explain how rump_server_check_memleaks works
 1.36 26-Aug-2019  ozaki-r tests: restore rump_server_check_poolleaks for llentpl

It didn't work correctly because rumphijack for vmstat didn't work expectedly;
vmstat has the sgid bit for kvm(3) and that prevents rumphijack from working.

Address the issue by cloning a vmstat binary without the sgid bit temporarily
and using it for rumphijack. Note that it's a workaround. vmstat should stop
using kvm(3) for /dev/kmem and drop the sgid bit eventually.
 1.35 20-Aug-2019  ozaki-r Disable rump_server_check_memleaks for now

It doesn't work in some cases.
 1.34 19-Aug-2019  ozaki-r tests: check pool object leaks

Currently only llentpl leaks can be detected.
 1.33 19-Aug-2019  ozaki-r tests: enable to create interfaces other than shmif with rump_server_add_iface

For this change interfaces are destroyed in the reverse order of their
creations in case there are dependencies between interfaces.
 1.32 18-Jul-2019  ozaki-r tests: extract all kernel logs, not just a panic message, from rump_server.core
 1.31 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.30 18-Apr-2019  ozaki-r tests: make utility funtions easy to use for tests that don't use the framework
 1.29 17-Jan-2019  knakahara Add ATF for ipsecif(4) pfil.
 1.28 07-Apr-2018  ozaki-r branches: 1.28.2;
Fix typo
 1.27 06-Apr-2018  ozaki-r Show outputs of commands if $DEBUG
 1.26 01-Feb-2018  ozaki-r branches: 1.26.2;
Commonalize and add tests of creating/destroying interfaces
 1.25 24-Nov-2017  kre Cosmetic changes, NFC intended.
1. get rid of the "$*" fetish.
2. more consistency (not complete .. yet) with RUMP_SERVER setting
3. white space (esp around pipe ('|') symbols.)
4. drop unnecessary \ line joining.
 1.24 07-Nov-2017  ozaki-r Stop using bpfjit

Because most architectures don't support it and npf still works without it.
 1.23 30-Oct-2017  ozaki-r Add test cases of NAT-T (transport mode)

A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
 1.22 20-Oct-2017  ozaki-r Suppress name resolution
 1.21 20-Oct-2017  ozaki-r Show packet counters
 1.20 24-Jul-2017  ozaki-r Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
 1.19 19-Jun-2017  ozaki-r Do netstat -a for an appropriate protocol
 1.18 02-Jun-2017  ozaki-r branches: 1.18.2;
Add IPSEC_KEY_DEBUG

Enable the debugging feature of IPsec key (sysctl net.key.debug)
on rump kernels if the ATF test is run with IPSEC_KEY_DEBUG=true.
 1.17 19-May-2017  ozaki-r Enable debug logging of kernels such as ARP and ND if $DEUBG=true
 1.16 17-May-2017  ozaki-r Add test cases of TCP communications with IPsec enabled

The test cases transfer data over TCP by using nc with IPsec just enabled
(no SA/SP is configured) and confirm the commit "Fix diagnostic assertion
failure in ipsec_init_policy" really fixes the issue.
 1.15 14-Apr-2017  ozaki-r branches: 1.15.2;
Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.14 06-Mar-2017  ozaki-r Fix ONEDAYISH; it can be followed by one extra space
 1.13 03-Mar-2017  ozaki-r Provide a more robust regexp for time formats of 1day-ish
 1.12 16-Feb-2017  ozaki-r Use nc instead of ftp/httpd

ftp with rumphijack is unstable probably because ftp uses siglongjmp from
a signal hander. So stop using ftp and use nc instead. This fixes test
failures of t_mtudisc on some environments such as my development machine
(amd64) and anita on sparc64.
 1.11 10-Jan-2017  ozaki-r branches: 1.11.2;
Test netstat -i -a and ifmcstat
 1.10 10-Jan-2017  ozaki-r Test dumping states before destroying interfaces
 1.9 28-Nov-2016  ozaki-r branches: 1.9.2;
Use redirection instead of pipeline

This is a workaround for PR bin/51667.
 1.8 26-Nov-2016  ozaki-r Skip dumping if no bus is used
 1.7 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.6 24-Nov-2016  ozaki-r Share httpd start/stop code
 1.5 24-Nov-2016  ozaki-r Move get_macaddr to net_common.sh
 1.4 24-Nov-2016  ozaki-r Move get_lladdr to net_common.sh
 1.3 24-Nov-2016  ozaki-r Move route check functions to net_common.sh
 1.2 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.1 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.9.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.9.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.9.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.9.2.1 28-Nov-2016  pgoyette file net_common.sh was added on branch pgoyette-localcount on 2017-01-07 08:56:55 +0000
 1.11.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.15.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.18.2.4 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.18.2.3 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #357):
distrib/sets/lists/debug/mi: 1.228
distrib/sets/lists/tests/mi: 1.765-1.766
etc/mtree/NetBSD.dist.tests: 1.149
sys/net/npf/npf_ctl.c: 1.49
tests/net/ipsec/Makefile: 1.10
tests/net/ipsec/algorithms.sh: 1.6
tests/net/ipsec/natt_terminator.c: 1.1
tests/net/ipsec/t_ipsec_natt.sh: 1.1
tests/net/net_common.sh: 1.23-1.24
usr.sbin/npf/npfctl/npfctl.c: 1.54
Handle esp-udp for NAT-T
--
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
--
Fix showing translated port (ntohs-ed twice wrongly)
--
Add test cases of NAT-T (transport mode)
A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
--
Add net/ipsec debug lib directory
--
Add ./usr/libdata/debug/usr/tests/net/ipsec
--
Stop using bpfjit
Because most architectures don't support it and npf still works without it.
 1.18.2.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.18.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.26.2.3 18-Jan-2019  pgoyette Synch with HEAD
 1.26.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.26.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.28.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.28.2.1 10-Jun-2019  christos Sync with HEAD
 1.44.4.1 02-Aug-2025  perseant Sync with HEAD
 1.44.2.1 22-Aug-2024  martin Pull up following revision(s) (requested by rin in ticket #781):

tests/net/net_common.sh: revision 1.45

tests/net_common.sh: Halt rump servers only if already started

Do not cat(1) missing ${_rump_server_socks}, in case where
a test should be skipped before starting any rump server.

NFC otherwise; if ${_rump_server_socks} is absent,
`rump_server_halt` can nothing anyway, unfortunately.

Thanks ozaki-r@ for discussion.
 1.1 14-Jul-2021  ozaki-r tests: add tests for ALTQ CBQ
 1.3 16-Jul-2021  ozaki-r tests, altq: fix checks of altqd startup

Hopefully the fix stabilizes test results on qemu/anita.
 1.2 14-Jul-2021  ozaki-r tests, altq: test new options
 1.1 14-Jul-2021  ozaki-r tests: add tests for ALTQ CBQ
 1.4 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.3 30-Jul-2015  ozaki-r branches: 1.3.2;
Fix TESTS_SH assignment
 1.2 30-Jul-2015  ozaki-r Add tests for IPv4 DAD
 1.1 29-Jul-2015  ozaki-r Add tests for ARP
 1.3.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.49 18-Aug-2025  ozaki-r tests: add tests for ARP address resolution
 1.48 09-Sep-2024  ozaki-r branches: 1.48.2;
tests: add tests for ARP cache entry creations
 1.47 09-Sep-2024  ozaki-r tests: dedup t_arp.sh like others (NFC)
 1.46 20-Aug-2024  ozaki-r tests, arp: add tests for GARP on link up
 1.45 18-Sep-2020  roy branches: 1.45.6; 1.45.8;
arp tests: Delete ARP entry after failed ping test

As it might hang around in WAITDELETE for a few seconds.
 1.44 17-Sep-2020  roy arp_rtm: Only ping once

Pointless doing 10 pings.
On a slow system, it's possible that many RTM_MISS messages could
overflow into the next test.
 1.43 15-Sep-2020  roy Don't check lifetime when testing published
 1.42 13-Sep-2020  roy arp test: Use the ndp cache expiration test in place of the old one

As the logic is the same.
While here, GC some variables and comment out a redundant sleep.
 1.41 11-Mar-2020  roy tests: check RTA_AUTHOR in messages
 1.40 09-Sep-2019  roy t_arp: Wait for 10 seconds for RTM_MISS

Let's try increasing the ping timeout to try and fix PR misc/54525.
 1.39 03-Sep-2019  roy tests: fix ARP and NDP tests for RTM_* messages

While here add tests for RTM_MISS.
 1.38 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.37 13-May-2019  bad branches: 1.37.2;
Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.36 06-Apr-2018  ozaki-r branches: 1.36.2;
Add tests for GARP without DAD

Additionally make the existing tests for GARP more explicit.
 1.35 06-Apr-2018  ozaki-r Improve packet checks and error reporting
 1.34 23-Nov-2017  kre branches: 1.34.2;
Clean up the arp_rtm subtest...

1. Be assertive when claiming the pid of the background route monitor command,
not polite... (ie: $! will give you the pid, $? is just 0 there).
2. Since "wait 0" simply (always) exits with status 127, immediately (we
know without thinking that we have no child with pid 0) the waits were
ineffective - now (after fix #1) they work .. which requires the
route monitor that watches the arp -d to exit after 1 message, not 2,
as 1 is all it gets. (If there really should be 2, someone needs to
find out why the kernel is sending only 1 - I am not that someone).
3. The file contents need to be read only once, no matter how many patterns
we need to look for, save some work, and do it that way (this is not
really a bug,m but saving time for the ATF tests is always a good thing.)

Not sure if this will stop it randomly failing on bablyon5, but it might.
(The likely cause is that the "route.monitor" has not flushed its stdout
buffers at the time the "grep -A 3" [aside: why that way to read the file??]
is performed, so fails to find its expected output ... the route monitor would
get an extra message once interfaces start being destroyed, I assume, and
would exit then, flushing its buffer, but by then it is too late.
If that is/was the cause, then it should be fixed now.)
 1.33 28-Jun-2017  ozaki-r Enable to remove multiple ARP/NDP entries for one destination

The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.

arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.

Related to PR 51179
 1.32 28-Jun-2017  ozaki-r Restore ARP/NDP entries to route show and netstat -r

Requested by dyoung@ some time ago
 1.31 28-Jun-2017  ozaki-r Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes

They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
 1.30 26-Jun-2017  ozaki-r Drop RTF_UP from a routing message of a deleted ARP/NDP entry
 1.29 26-Jun-2017  ozaki-r Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry

A message originally included only DST and GATEWAY. Restore it.
 1.28 26-Jun-2017  ozaki-r Fix usage of routing messages on arp -d and ndp -d

It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
 1.27 22-Jun-2017  ozaki-r Test implicit removals of ARP/NDP entries

One test case reproudces PR 51179.
 1.26 21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.25 21-Jun-2017  ozaki-r Set net.inet.arp.keep only if it's required
 1.24 19-Jun-2017  ozaki-r Add missing declarations for cleanup
 1.23 16-Jun-2017  ozaki-r Test routing messages emitted on operations of ARP/NDP entries
 1.22 25-Nov-2016  ozaki-r branches: 1.22.6;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.21 24-Nov-2016  ozaki-r Move get_macaddr to net_common.sh
 1.20 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.19 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.18 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.17 23-Aug-2016  christos no functional change
 1.16 21-Jun-2016  ozaki-r branches: 1.16.2;
Make a bunch of test names self-descriptive
 1.15 18-Apr-2016  ozaki-r Add a test case for static ARP

It tests receiving an ARP request that has a spa (i.e., IP address) whose
ARP entry already exists in the table as a static ARP entry.
 1.14 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.13 04-Mar-2016  ozaki-r Improve tests of proxy arp

The tests make it clear how it behaves though, I don't know if the current
behavior is what it should be.
 1.12 29-Feb-2016  ozaki-r Add tests on activating a new MAC address
 1.11 25-Feb-2016  ozaki-r Add basic tests for Proxy ARP

The tests don't much enough and need more realitic tests, for example
tests for a setup using ppp found in PR 44032.
 1.10 02-Dec-2015  ozaki-r Make checks strict

rump.arp should fail with File exists (EEXIST).
 1.9 31-Aug-2015  ozaki-r Reflect the current ARP cache implementation in tests

net.inet.arp.prune and net.inet.arp.refresh were obsoleted.
 1.8 13-Aug-2015  ozaki-r Reflect a fix on rt_refcnt

The test was adjusted based on wrong behavior.
 1.7 07-Aug-2015  ozaki-r Check MAC address of ARP caches additionally
 1.6 31-Jul-2015  ozaki-r Return 0 explicitly to avoid unexpected failures when $DEBUG=false
 1.5 31-Jul-2015  ozaki-r Add tests of cache overwriting
 1.4 31-Jul-2015  ozaki-r Add tests for temp option
 1.3 30-Jul-2015  ozaki-r Add tests for arp -a option
 1.2 30-Jul-2015  ozaki-r Add tests for GARP
 1.1 29-Jul-2015  ozaki-r Add tests for ARP
 1.16.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.22.6.2 08-Apr-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #701):
sys/netinet/in.c: 1.227
sys/netinet6/in6.c: 1.265
tests/net/arp/t_arp.sh: 1.35-1.36
Make GARP work again when DAD is disabled
The change avoids setting an IP address tentative on initializing it when
the IPv4 DAD is disabled (net.inet.ip.dad_count=0), which allows a GARP packet
to be sent (see arpannounce). This is the same behavior of NetBSD 7, i.e.,
before introducing the IPv4 DAD.
Additionally do the same change to IPv6 DAD for consistency.
The change is suggested by roy@
--
Improve packet checks and error reporting
--
Add tests for GARP without DAD
Additionally make the existing tests for GARP more explicit.
 1.22.6.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.34.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.36.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.36.2.1 10-Jun-2019  christos Sync with HEAD
 1.37.2.3 09-Sep-2024  martin Backout pullup of

tests/net/arp/t_arp.sh 1.46

for ticket #1883, this part is not suitable for this branch.
Requested by ozaki-r.
 1.37.2.2 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1883):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.37.2.1 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #175):

tests/net/arp/t_arp.sh: revision 1.39
tests/net/ndp/t_ndp.sh: revision 1.36

tests: fix ARP and NDP tests for RTM_* messages

While here add tests for RTM_MISS.
 1.45.8.1 02-Aug-2025  perseant Sync with HEAD
 1.45.6.3 29-Aug-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1154):

sys/net/nd.c: revision 1.8
tests/net/arp/t_arp.sh: revision 1.49

nd: fix the number of requests for address resolution

ARP is expected to send requests for address resolution
net.inet.arp.nd_bmaxtries times at most. However, it sends
one more. IPv6 ND also behaves the same way.

The fix requires nd_set_timer reorganization to handle
scheduling timer without sending an NS message.
PR kern/59596

tests: add tests for ARP address resolution
 1.45.6.2 13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #859):

tests/net/arp/t_arp.sh: revision 1.47
tests/net/arp/t_arp.sh: revision 1.48
sys/netinet/if_arp.c: revision 1.315

arp: allow to send packets without an ARP resolution just after
receiving an ARP request

On receiving an ARP request, the current implemention creates an ARP
cache entry but with ND_LLINFO_NOSTATE. Such an entry still needs
an ARP resolution to send back a packet to the requester. The original
behavior before introducing the common ND framework didn't need the
resolution. IPv6 doesn't as well. To restore the original behavior,
make a new ARP cache entry with ND_LLINFO_STALE like IPv6 does.

tests: dedup t_arp.sh like others (NFC)

tests: add tests for ARP cache entry creations
 1.45.6.1 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #812):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.48.2.1 29-Aug-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #22):

sys/net/nd.c: revision 1.8
tests/net/arp/t_arp.sh: revision 1.49

nd: fix the number of requests for address resolution

ARP is expected to send requests for address resolution
net.inet.arp.nd_bmaxtries times at most. However, it sends
one more. IPv6 ND also behaves the same way.

The fix requires nd_set_timer reorganization to handle
scheduling timer without sending an NS message.
PR kern/59596

tests: add tests for ARP address resolution
 1.16 20-Aug-2024  ozaki-r tests, arp: add tests of address duplications without DAD
 1.15 11-Mar-2017  ozaki-r branches: 1.15.14; 1.15.22; 1.15.24;
Improve test stability and output messages on failure
 1.14 08-Mar-2017  ozaki-r Improve test stability and output messages on failure
 1.13 25-Nov-2016  ozaki-r branches: 1.13.2;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.12 24-Nov-2016  ozaki-r Add missing bus argument for extract_new_packets
 1.11 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.10 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.9 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.8 14-Sep-2016  christos adjust for new ifconfig output.
 1.7 10-Aug-2016  kre + -lrumpdev
 1.6 25-Aug-2015  ozaki-r branches: 1.6.2;
Give a chance to send a DAD announce packet

ifconfig -w ensures IP addresses have left tentative state, however,
it doesn't guarantee that a DAD announce packet is sent. The kernel
clears tentative flag and then sends the packet so that ifconfig -w
can return before the kernel sends the packet.
 1.5 24-Aug-2015  ozaki-r Disable another tentative state check

It's too ephemeral to check robustly.
 1.4 17-Aug-2015  ozaki-r Improve test stability

- Take a diff between packet dumps and use it for packet checking
- it's resistant against packet reorder
- Seep 2 sec to make sure a NS message is sent
- Disable tentative state check for now
- it's too ephemeral to check robustly
 1.3 31-Jul-2015  ozaki-r Remove remaining debug code
 1.2 31-Jul-2015  ozaki-r Fix cleanup; halt all running rump_servers
 1.1 30-Jul-2015  ozaki-r Add tests for IPv4 DAD
 1.6.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.6.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.6.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.13.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.15.24.1 02-Aug-2025  perseant Sync with HEAD
 1.15.22.1 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #812):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.15.14.1 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1883):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.2 13-Jul-2010  jmmv Get rid of static Atffiles and let bsd.test.mk generate them on the fly.
 1.1 21-Apr-2010  pooka Check that bpf doesn't accept programs with divide-by-zero in them.
Example filter from Guy Harris via PR kern/43185.
 1.7 18-Sep-2025  mrg introduce a couple of new turn-off-gcc-warning variables and use them.

GCC 14 has a new annoying calloc() checker that we turn off in a bunch
of places, and there are a few more dangling-pointer issuse that come up,
but seem bogus.
 1.6 01-Mar-2020  christos Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.5 07-Jul-2014  alnsn branches: 1.5.24;
Add bpf/t_mbuf test to the build.
 1.4 10-Jun-2014  he Fix static linking for the tests: -lrump is also used by -lrumpuser,
so we also need -lrump after -lrumpuser. Fixes build for sun2.
 1.3 03-Jan-2011  christos branches: 1.3.12; 1.3.22;
PR/44310: Alexander Nasonov: write to /dev/bpf truncates size_t to int
 1.2 06-Dec-2010  pooka Add an xfail test for the mbuf leak described in PR kern/44196.

This is yet another example of a simple test which would be much
trickier to execute against the host kernel. You would either need
to put networking in a complete lockdown, or do some "statistical"
methods where you trigger the bug many many times and attempt to
ascertain a rising trend in mbuf count. And, of course, the leaked
mbufs don't go away from the host kernel once the test ends. In
contrast, we *know* that there is no other networking activity in
a rump kernel, so we can execute the operation exactly once, plus
the leaked mbuf "disappears" when the test is done.
 1.1 21-Apr-2010  pooka Check that bpf doesn't accept programs with divide-by-zero in them.
Example filter from Guy Harris via PR kern/43185.
 1.3.22.1 10-Aug-2014  tls Rebase.
 1.3.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.24.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2 08-Jul-2014  alnsn branches: 1.2.2; 1.2.6;
Clone libbpfjit tests to check kernel implementation of bpfjit.
Old content of t_bpfjit.c will be moved to t_mbuf.c shortly.
Change packet buffers to unsigned char[] type.
 1.1 07-Jul-2014  alnsn Add some helper functions for bpf/bpfjit rump tests.
 1.2.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.1 08-Jul-2014  tls file h_bpf.h was added on branch tls-maxphys on 2014-08-20 00:04:51 +0000
 1.2.2.2 10-Aug-2014  tls Rebase.
 1.2.2.1 08-Jul-2014  tls file h_bpf.h was added on branch tls-earlyentropy on 2014-08-10 06:57:30 +0000
 1.9 10-Sep-2022  rillig fix misspellings of 'available' and nearby typos
 1.8 09-Feb-2017  ozaki-r Add tests for several bpf ioctls
 1.7 01-Feb-2017  ozaki-r Add a test case for BIOCGBLEN and BIOCSBLEN
 1.6 13-Jan-2017  christos branches: 1.6.2;
Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.5 14-Aug-2012  alnsn branches: 1.5.14;
Add __RCSID and and make a couple of stylistic changes.
 1.4 13-Jan-2012  christos PR/44196 is now fixed, so don't expect a failure.
 1.3 18-Dec-2011  joerg Remove unused variable
 1.2 03-Jan-2011  christos branches: 1.2.6;
PR/44310: Alexander Nasonov: write to /dev/bpf truncates size_t to int
 1.1 06-Dec-2010  pooka Add an xfail test for the mbuf leak described in PR kern/44196.

This is yet another example of a simple test which would be much
trickier to execute against the host kernel. You would either need
to put networking in a complete lockdown, or do some "statistical"
methods where you trigger the bug many many times and attempt to
ascertain a rising trend in mbuf count. And, of course, the leaked
mbufs don't go away from the host kernel once the test ends. In
contrast, we *know* that there is no other networking activity in
a rump kernel, so we can execute the operation exactly once, plus
the leaked mbuf "disappears" when the test is done.
 1.2.6.2 30-Oct-2012  yamt sync with head
 1.2.6.1 17-Apr-2012  yamt sync with head
 1.5.14.1 20-Mar-2017  pgoyette Sync with HEAD
 1.6.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.4 01-Mar-2013  pooka Rump kernel hypercalls are not necessary here.
 1.3 03-Nov-2010  christos branches: 1.3.6; 1.3.12;
add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.2 21-Apr-2010  pooka Add bpf program source in a comment.
 1.1 21-Apr-2010  pooka Check that bpf doesn't accept programs with divide-by-zero in them.
Example filter from Guy Harris via PR kern/43185.
 1.3.12.1 23-Jun-2013  tls resync from head
 1.3.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.2 08-Jul-2014  alnsn branches: 1.2.2; 1.2.6; 1.2.10;
Clone libbpfjit tests to check kernel implementation of bpfjit.
Old content of t_bpfjit.c will be moved to t_mbuf.c shortly.
Change packet buffers to unsigned char[] type.
 1.1 07-Jul-2014  alnsn Add rump tests for checking how bpf_validate() works with mbuf chains.
 1.2.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.2.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.1 08-Jul-2014  tls file t_mbuf.c was added on branch tls-maxphys on 2014-08-20 00:04:51 +0000
 1.2.2.2 10-Aug-2014  tls Rebase.
 1.2.2.1 08-Jul-2014  tls file t_mbuf.c was added on branch tls-earlyentropy on 2014-08-10 06:57:30 +0000
 1.3 01-Mar-2020  christos Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.2 10-Jun-2014  he branches: 1.2.24;
Fix static linking for the tests: -lrump is also used by -lrumpuser,
so we also need -lrump after -lrumpuser. Fixes build for sun2.
 1.1 14-Aug-2012  alnsn branches: 1.1.2; 1.1.4; 1.1.10;
Add t_bpfilter test. At the moment, it only checks
that bpf program can read bytes from mbuf chain.
 1.1.10.1 10-Aug-2014  tls Rebase.
 1.1.4.2 30-Oct-2012  yamt sync with head
 1.1.4.1 14-Aug-2012  yamt file Makefile was added on branch yamt-pagecache on 2012-10-30 19:00:05 +0000
 1.1.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.24.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.11 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.10 11-Feb-2015  alnsn branches: 1.10.2;
Add bpfilternegjmp test.
 1.9 11-Feb-2015  alnsn Add bpfilterbadjmp and bpfilterbadret tests.
 1.8 24-Jun-2014  alnsn Zap trailing spaces.
 1.7 18-Dec-2013  alnsn branches: 1.7.2;
Add bpfilterbadmem, bpfilternoinitA and bpfilternoinitX tests.
 1.6 03-Sep-2012  alnsn branches: 1.6.2; 1.6.4;
Fix test timeout.
 1.5 31-Aug-2012  pgoyette There's a known-but-unresolved race condition in here somewhere that
causes these tests to sometimes deadlock. Since they run really fast
when they are successful, it doesn't do any good to wait for the
default 5-minute timeout. So explicitly set timeout for these tests
to just 30 seconds.
 1.4 16-Aug-2012  alnsn Close pipes on exit.
 1.3 16-Aug-2012  alnsn Wait for a child to initialise its network stack before sending a ping.
Decrease BIOCSRTIMEOUT to 500ms.
 1.2 15-Aug-2012  alnsn Test contiguous buffer as well.
 1.1 14-Aug-2012  alnsn Add t_bpfilter test. At the moment, it only checks
that bpf program can read bytes from mbuf chain.
 1.6.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.4.2 30-Oct-2012  yamt sync with head
 1.6.4.1 03-Sep-2012  yamt file t_bpfilter.c was added on branch yamt-pagecache on 2012-10-30 19:00:05 +0000
 1.6.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.1 10-Aug-2014  tls Rebase.
 1.10.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.9 01-Mar-2020  christos Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.8 01-Jun-2019  kre Deal with fallout from the addition of
KERN_PROC_CWD in sysctl(3)
That is kern.proc.$$.KERN_PROC_CWD (I think - not that it matters here)

The effect is that -lrump now requires -lrumpvfs

This set of changes fixes (I believe) regular dynamic builds,
more might be required for static builds (will be verified soon).
 1.7 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.6 08-Aug-2016  pgoyette branches: 1.6.14;
This now needs librumpdev and librumpvfs to work.
 1.5 24-May-2016  hannken Disable PAX mprotect to make just-in-time-compile tests work again.

Ok: Christos Zoulas
 1.4 09-Jul-2014  alnsn branches: 1.4.2; 1.4.6;
Add t_cop and t_extmem kernel bpfjit tests to the build.
 1.3 08-Jul-2014  alnsn Add t_mbuf tests to the build.
 1.2 30-Jun-2014  alnsn Fix test directory.
 1.1 30-Jun-2014  alnsn Add bpfjit kernel tests for loading from mbuf chain.
 1.4.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.6.1 09-Jul-2014  tls file Makefile was added on branch tls-maxphys on 2014-08-20 00:04:51 +0000
 1.4.2.2 10-Aug-2014  tls Rebase.
 1.4.2.1 09-Jul-2014  tls file Makefile was added on branch tls-earlyentropy on 2014-08-10 06:57:30 +0000
 1.6.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.6.14.1 10-Jun-2019  christos Sync with HEAD
 1.12 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.11 14-Feb-2015  alnsn branches: 1.11.2;
Add two more bpfjit_jmp_jeq_x_noinit_XX tests.
 1.10 14-Feb-2015  alnsn Improve bpfjit_jmp_jeq_x test.
 1.9 14-Feb-2015  alnsn BPF_JMP+BPF_JEQ+BPF_X doesn't compare X with k, it compares X with A.
Fix it in the bpfjit_jmp_jeq_x_noinit_ax test.
 1.8 14-Feb-2015  alnsn Avoid testing for zero rv in bpfjit_jmp_x_uninitialised. Unitialised
X isn't a problem for bpf_validate().
 1.7 14-Feb-2015  alnsn Add bpfjit_jmp_x_uninitialised test.

Found by http://lcamtuf.coredump.cx/afl/.
 1.6 11-Feb-2015  alnsn Add bpfjit_jmp_ja_overflow test.
 1.5 11-Feb-2015  alnsn Add bpfjit_ret_k, bpfjit_bad_ret_k, bpfjit_jmp_ja_invalid tests.
 1.4 20-Nov-2014  alnsn Add BPF_MOD tests. Plus one tiny change.
 1.3 19-Nov-2014  alnsn Add BPF_XOR tests.
 1.2 08-Jul-2014  alnsn branches: 1.2.2; 1.2.6;
Clone libbpfjit tests to check kernel implementation of bpfjit.
Old content of t_bpfjit.c will be moved to t_mbuf.c shortly.
Change packet buffers to unsigned char[] type.
 1.1 30-Jun-2014  alnsn Add bpfjit kernel tests for loading from mbuf chain.
 1.2.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.1 08-Jul-2014  tls file t_bpfjit.c was added on branch tls-maxphys on 2014-08-20 00:04:51 +0000
 1.2.2.2 10-Aug-2014  tls Rebase.
 1.2.2.1 08-Jul-2014  tls file t_bpfjit.c was added on branch tls-earlyentropy on 2014-08-10 06:57:30 +0000
 1.11.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.4 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.3 13-Jul-2014  alnsn branches: 1.3.2; 1.3.6; 1.3.10;
Add bpfjit_cop_copx and bpfjit_copx_cop tests.
 1.2 09-Jul-2014  alnsn Fix copy/paste error: s/rump_unschedule/rump_schedule/.
 1.1 09-Jul-2014  alnsn Add t_cop and t_extmem kernel bpfjit tests.
 1.3.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.3.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.6.1 13-Jul-2014  tls file t_cop.c was added on branch tls-maxphys on 2014-08-20 00:04:51 +0000
 1.3.2.2 10-Aug-2014  tls Rebase.
 1.3.2.1 13-Jul-2014  tls file t_cop.c was added on branch tls-earlyentropy on 2014-08-10 06:57:30 +0000
 1.2 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.1 09-Jul-2014  alnsn branches: 1.1.2; 1.1.6; 1.1.10;
Add t_cop and t_extmem kernel bpfjit tests.
 1.1.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.1.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.6.1 09-Jul-2014  tls file t_extmem.c was added on branch tls-maxphys on 2014-08-20 00:04:51 +0000
 1.1.2.2 10-Aug-2014  tls Rebase.
 1.1.2.1 09-Jul-2014  tls file t_extmem.c was added on branch tls-earlyentropy on 2014-08-10 06:57:30 +0000
 1.2 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.1 08-Jul-2014  alnsn branches: 1.1.2; 1.1.6; 1.1.10;
Move bpfjit mbuf tests to t_mbuf.c.
 1.1.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.1.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.6.1 08-Jul-2014  tls file t_mbuf.c was added on branch tls-maxphys on 2014-08-20 00:04:51 +0000
 1.1.2.2 10-Aug-2014  tls Rebase.
 1.1.2.1 08-Jul-2014  tls file t_mbuf.c was added on branch tls-earlyentropy on 2014-08-10 06:57:30 +0000
 1.4 11-Dec-2024  andvar s/inclued/included/ in comment.
 1.3 01-Mar-2020  christos branches: 1.3.10;
Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.2 27-May-2017  bouyer branches: 1.2.10;
merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.1 15-Jan-2017  bouyer branches: 1.1.2;
file Makefile was initially added on branch bouyer-socketcan.
 1.1.2.3 05-Feb-2017  bouyer Implement CAN_RAW_FILTER socket option, and add tests for it.
 1.1.2.2 04-Feb-2017  bouyer Factor out reading from a can socket, and move to a helper file.
 1.1.2.1 15-Jan-2017  bouyer Basic tests for our SocketCAN implementation (using rump)
 1.2.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.10.1 02-Aug-2025  perseant Sync with HEAD
 1.4 13-Oct-2019  mrg ifr_name is nul terminated. make it so.
 1.3 28-May-2017  kre branches: 1.3.10;

Needs %zu fix for sizeof as well.
 1.2 27-May-2017  bouyer merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.1 04-Feb-2017  bouyer branches: 1.1.2;
file h_canutils.c was initially added on branch bouyer-socketcan.
 1.1.2.5 17-Apr-2017  bouyer Make it build from build.sh (fix warnings)
 1.1.2.4 05-Feb-2017  bouyer Factor out creation of socket with CAN_RAW_RECV_OWN_MSGS
 1.1.2.3 05-Feb-2017  bouyer factor out socket bind to interface
 1.1.2.2 05-Feb-2017  bouyer Decrease timeout from 2 to 1 second. Speeds up the tests where timeout is
the expected case, and it should still be enough to get the looped back
packet.
 1.1.2.1 04-Feb-2017  bouyer Factor out reading from a can socket, and move to a helper file.
 1.3.10.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2 27-May-2017  bouyer merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.1 04-Feb-2017  bouyer branches: 1.1.2;
file h_canutils.h was initially added on branch bouyer-socketcan.
 1.1.2.3 17-Apr-2017  bouyer Make it build from build.sh (fix warnings)
 1.1.2.2 05-Feb-2017  bouyer factor out socket bind to interface
 1.1.2.1 04-Feb-2017  bouyer Factor out reading from a can socket, and move to a helper file.
 1.8 20-Aug-2021  andvar fix various typos in comments and log messages.
 1.7 24-Jun-2019  skrll Another spello of 'unknown'
 1.6 09-Jun-2017  bouyer branches: 1.6.6;
Test bind()ing to a non-existent interface.
 1.5 28-May-2017  christos branches: 1.5.2;
undo previous; we don't have any archs where socklen_t != uint32_t.
 1.4 28-May-2017  christos fix format.
 1.3 28-May-2017  martin Fix size_t format strings
 1.2 27-May-2017  bouyer merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.1 15-Jan-2017  bouyer branches: 1.1.2;
file t_can.c was initially added on branch bouyer-socketcan.
 1.1.2.7 17-Apr-2017  bouyer Make it build from build.sh (fix warnings)
 1.1.2.6 05-Feb-2017  bouyer Factor out creation of socket with CAN_RAW_RECV_OWN_MSGS
 1.1.2.5 05-Feb-2017  bouyer factor out socket bind to interface
 1.1.2.4 05-Feb-2017  bouyer Decrease timeout from 2 to 1 second. Speeds up the tests where timeout is
the expected case, and it should still be enough to get the looped back
packet.
 1.1.2.3 04-Feb-2017  bouyer Factor out reading from a can socket, and move to a helper file.
 1.1.2.2 16-Jan-2017  bouyer Adapt to CAN_RAW_RECV_OWN_MSGS being off by default, and test
CAN_RAW_RECV_OWN_MSGS and CAN_RAW_LOOPBACK options.
 1.1.2.1 15-Jan-2017  bouyer Basic tests for our SocketCAN implementation (using rump)
 1.5.2.1 15-Jun-2017  snj Pull up following revision(s) (requested by bouyer in ticket #34):
sys/netcan/can_pcb.c: revision 1.6
tests/net/can/t_can.c: revision 1.6
Refuse to bind to a non-CAN interface.
Also release the lock in the error branch.
--
Test bind()ing to a non-existent interface.
 1.6.6.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2 27-May-2017  bouyer merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.1 05-Feb-2017  bouyer branches: 1.1.2;
file t_canfilter.c was initially added on branch bouyer-socketcan.
 1.1.2.4 17-Apr-2017  bouyer Make it build from build.sh (fix warnings)
 1.1.2.3 05-Feb-2017  bouyer Factor out creation of socket with CAN_RAW_RECV_OWN_MSGS
 1.1.2.2 05-Feb-2017  bouyer factor out socket bind to interface
 1.1.2.1 05-Feb-2017  bouyer Implement CAN_RAW_FILTER socket option, and add tests for it.
 1.6 16-Jan-2017  ozaki-r Rewrite tests for CARP in a shell script instead of C

The new shell script enables us to modify/add tests easily.
 1.5 08-Aug-2016  pgoyette branches: 1.5.2;
No underscore needed
 1.4 08-Aug-2016  pgoyette We also need librump_vfs

While here, remove duplicate entry for librump
 1.3 08-Aug-2016  pgoyette And yet another one
 1.2 10-Jun-2014  he branches: 1.2.6;
Fix static linking for the tests: -lrump is also used by -lrumpuser,
so we also need -lrump after -lrumpuser. Fixes build for sun2.
 1.1 10-Aug-2010  pooka branches: 1.1.12; 1.1.22;
Add a most elementary carp test: it forks off two processes,
configures carp in each of them, and after verifying that the shared
address responds to ping it brutally kills the master like a proper
carnivore (none of that ifconfig down sissy vegan nonsense). Then
the test checks if the backup got its act together by pinging the
shared address and passes verdict.
 1.1.22.1 10-Aug-2014  tls Rebase.
 1.1.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.1 20-Mar-2017  pgoyette Sync with HEAD
 1.5.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.7 16-Jan-2017  ozaki-r Rewrite tests for CARP in a shell script instead of C

The new shell script enables us to modify/add tests easily.
 1.6 13-Jan-2017  christos branches: 1.6.2;
Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.5 26-Jun-2011  christos branches: 1.5.24;
fix fallout from including signal.h in rump_syscallargs.h
 1.4 07-Nov-2010  jmmv Adjusts tests after import of atf-0.12:

- The use.fs property is gone.
- Mark the tests/fs/t_create:attrs test as broken when using the default
unprivileged-user:_atf setting. This probably deserves a fix somehow
but I'm not sure at this point.
 1.3 03-Nov-2010  christos add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.2 11-Aug-2010  pooka Put some sleeps between forking the hosts. Otherwise a race may
be triggered which appears to cause one host to go berzerk with
sending carp advertisements and ignore ping requests.

I'll get to the bottom of this eventually, but this is a stopgap
to prevent the test from failing, hopefully -- the race doesn't
appear to trigger for me even with 0.1s on a loaded machine, so
0.5s should be better than fine.

(hi jmmv ;)
 1.1 10-Aug-2010  pooka Add a most elementary carp test: it forks off two processes,
configures carp in each of them, and after verifying that the shared
address responds to ping it brutally kills the master like a proper
carnivore (none of that ifconfig down sissy vegan nonsense). Then
the test checks if the backup got its act together by pinging the
shared address and passes verdict.
 1.5.24.1 20-Mar-2017  pgoyette Sync with HEAD
 1.6.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.9 19-Sep-2023  gson Don't expect the net/carp/t_basic/carp_handover_ipv6_halt_nocarpdevip
and carp_handover_ipv6_ifdown_nocarpdevip test cases to fail. At
least on the TNF i386 and amd64 testbeds, they pass more often than
not since the commit of src/sys/netinet/ip_carp.c 1.119 by mlelstv on
2023.04.07.06.44.08.
 1.8 19-Aug-2019  ozaki-r branches: 1.8.8;
tests: use rump_server_add_iface to create interfaces
 1.7 03-Aug-2017  ozaki-r branches: 1.7.4;
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.6 18-May-2017  ozaki-r branches: 1.6.2;
Test CARP handover on setups without having IPs on carpdev (shmif)

Note that tests for IPv6 don't pass yet; nd6 needs to handle CARP
correctly like arp does.
 1.5 18-May-2017  ozaki-r Reduce duplicated codes (DRY)
 1.4 27-Feb-2017  ozaki-r branches: 1.4.2; 1.4.4; 1.4.6;
Make CARP on IPv6 work

It passes ATF tests but no more, no less.
 1.3 27-Feb-2017  ozaki-r Test handovers on interface down as well as server halt
 1.2 27-Feb-2017  ozaki-r Add a test case for CARP on IPv6

The test case fails expectedly because the implementation of CARP on IPv6
is incomplete yet.
 1.1 16-Jan-2017  ozaki-r Rewrite tests for CARP in a shell script instead of C

The new shell script enables us to modify/add tests easily.
 1.4.6.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.4.4.2 21-Apr-2017  bouyer Sync with HEAD
 1.4.4.1 27-Feb-2017  bouyer file t_basic.sh was added on branch bouyer-socketcan on 2017-04-21 16:54:12 +0000
 1.4.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.4.2.1 27-Feb-2017  pgoyette file t_basic.sh was added on branch pgoyette-localcount on 2017-03-20 06:58:00 +0000
 1.6.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.7.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.8.8.1 21-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #902):

sbin/ifconfig/carp.c: revision 1.15
sbin/ifconfig/ifconfig.8: revision 1.125
tests/net/carp/t_basic.sh: revision 1.9
sys/netinet/ip_carp.c: revision 1.118
sys/netinet/ip_carp.c: revision 1.119

Fix parser for carp state.

The state values are uppercase words INIT, BACKUP and MASTER.

Use backing device to send advertisements. Otherwise the packets originate
from the virtual MAC address, which confuses switches.

Select virtual address as sender if backing interface is anonymous.

Use correct scope for IPv6.

Don't expect the net/carp/t_basic/carp_handover_ipv6_halt_nocarpdevip
and carp_handover_ipv6_ifdown_nocarpdevip test cases to fail. At
least on the TNF i386 and amd64 testbeds, they pass more often than
not since the commit of src/sys/netinet/ip_carp.c 1.119 by mlelstv on
2023.04.07.06.44.08.
 1.10 23-Apr-2020  joerg Replace noatf global with conditional compilation
 1.9 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.8 03-Jul-2013  pooka branches: 1.8.10;
Remove makevirtif(), it's not likely to be used in testing.
(and this file should go away anyway in favor of using ifconfig etc.)
 1.7 28-Feb-2011  pooka branches: 1.7.4; 1.7.10;
make netcfg produce sensible results in a non-atf env
 1.6 17-Aug-2010  pooka branches: 1.6.2;
* add interface for virtif creation (in addition to the already present shmif)
* don't leak sockets
 1.5 09-Aug-2010  pooka may be __unused
 1.4 09-Aug-2010  pooka add a simple pingtest
 1.3 26-Jul-2010  pooka Calculate broadcast IP instead of requiring it as a config parameter.
 1.2 25-Jul-2010  pooka necessary headers
 1.1 25-Jul-2010  pooka make interface/routing configuration a bit more generic
 1.6.2.1 05-Mar-2011  bouyer Sync with HEAD
 1.7.10.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.4 13-Sep-2012  joerg branches: 1.4.2;
Workaround infrastructure bug where additional rpath flags are added for
/lib, resulting in linker warnings for the compat case.
 1.3 10-Sep-2012  adam branches: 1.3.2;
Fix building with MKCOMPAT=no
 1.2 16-Aug-2012  martin Slightly reaarange, so that the 32bit version actually gets compiled
and linked with -m32.
 1.1 13-Aug-2012  christos add fdpass tests
 1.3.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.4.2.2 30-Oct-2012  yamt sync with head
 1.4.2.1 13-Sep-2012  yamt file Makefile was added on branch yamt-pagecache on 2012-10-30 19:00:06 +0000
 1.1 13-Aug-2012  christos branches: 1.1.4;
add fdpass tests
 1.1.4.2 30-Oct-2012  yamt sync with head
 1.1.4.1 13-Aug-2012  yamt file fdpass.c was added on branch yamt-pagecache on 2012-10-30 19:00:06 +0000
 1.2 16-Aug-2012  martin branches: 1.2.4;
Do not disturb the atf output with debugging echos
 1.1 13-Aug-2012  christos add fdpass tests
 1.2.4.2 30-Oct-2012  yamt sync with head
 1.2.4.1 16-Aug-2012  yamt file t_fdpass.sh was added on branch yamt-pagecache on 2012-10-30 19:00:06 +0000
 1.2 13-Jul-2010  jmmv Get rid of static Atffiles and let bsd.test.mk generate them on the fly.
 1.1 04-Jul-2010  pooka Add test case for PR kern/43548

Due to the nature of the feature under test, this one is a little
different, so let me explain how it works.

The test program forks and bootstraps a rump kernel in both processes.
It then configures shared memory interfaces in both. shmif is nice
in that it uses a mmaped file as the bus and does not require root
privileges for communication between two (or more) processes. The
child process then proceeds to increase icmp.returndatabytes as
indicated by the PR, while the parent process sets the global TTL
of the rump kernel to 1 (note: both values only affect the respective
rump kernels, not each other or more importantly the host kernel).
The parent then sends the bad packet which is supposed to be routed
by the child. If ip_icmp.c was too old, *boom* + fail; otherwise
nothing bad happens and the test exists with success after one
second.

Eventually this test can be extended into a framework for automated
testing of any networking code which requires (arbitrarily complex)
routing setups.
 1.12 01-Mar-2020  christos Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.11 01-Jun-2019  kre Deal with fallout from the addition of
KERN_PROC_CWD in sysctl(3)
That is kern.proc.$$.KERN_PROC_CWD (I think - not that it matters here)

The effect is that -lrump now requires -lrumpvfs

This set of changes fixes (I believe) regular dynamic builds,
more might be required for static builds (will be verified soon).
 1.10 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.9 24-Nov-2016  ozaki-r branches: 1.9.14;
Move get_lladdr to net_common.sh
 1.8 08-Aug-2016  pgoyette And this one also needs librumpvfs
 1.7 08-Aug-2016  pgoyette More need for librumpdev
 1.6 14-Sep-2015  ozaki-r branches: 1.6.2;
Add tests for IPv6 ICMP redirect

Note that tests for redirect timeout doesn't work for now due to
PR kern/50240.

From s-yamaguchi@IIJ (with some fixes and tweaks by ozaki-r)
 1.5 31-Aug-2015  ozaki-r Add tests for ICMP redirect timeout
 1.4 10-Jun-2014  he Fix static linking for the tests: -lrump is also used by -lrumpuser,
so we also need -lrump after -lrumpuser. Fixes build for sun2.
 1.3 14-Dec-2010  pooka branches: 1.3.12; 1.3.22;
Add another version of the simple ping test, this time written as a
shell script and using rump_server, rump.ifconfig and rump.ping.

XXX: uses rump_allserver for now, though, since i noticed a problem
where the rump kernel syscall vector does not get updated for
dlopen()'d libraries (and hence if you dlopen librumpnet.so, socket()
still gives ENOSYS). Me be fixink it later.
 1.2 09-Aug-2010  pooka test that kernel reponds to ping
 1.1 04-Jul-2010  pooka Add test case for PR kern/43548

Due to the nature of the feature under test, this one is a little
different, so let me explain how it works.

The test program forks and bootstraps a rump kernel in both processes.
It then configures shared memory interfaces in both. shmif is nice
in that it uses a mmaped file as the bus and does not require root
privileges for communication between two (or more) processes. The
child process then proceeds to increase icmp.returndatabytes as
indicated by the PR, while the parent process sets the global TTL
of the rump kernel to 1 (note: both values only affect the respective
rump kernels, not each other or more importantly the host kernel).
The parent then sends the bad packet which is supposed to be routed
by the child. If ip_icmp.c was too old, *boom* + fail; otherwise
nothing bad happens and the test exists with success after one
second.

Eventually this test can be extended into a framework for automated
testing of any networking code which requires (arbitrarily complex)
routing setups.
 1.3.22.1 10-Aug-2014  tls Rebase.
 1.3.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.9.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9.14.1 10-Jun-2019  christos Sync with HEAD
 1.10 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.9 26-Feb-2015  martin branches: 1.9.2;
Do not use artificial low timeouts - slow machines might be still paging
in all the rump environment. Bump timeout from 4 seconds to 20 (my shark
needs ~9).
 1.8 18-Mar-2012  jruoho Move more PR references from comments to ATF's "descr".
 1.7 07-Nov-2010  jmmv branches: 1.7.6;
Adjusts tests after import of atf-0.12:

- The use.fs property is gone.
- Mark the tests/fs/t_create:attrs test as broken when using the default
unprivileged-user:_atf setting. This probably deserves a fix somehow
but I'm not sure at this point.
 1.6 03-Nov-2010  christos add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.5 26-Jul-2010  pooka Remove stuff handled by common code now.
 1.4 26-Jul-2010  pooka Calculate broadcast IP instead of requiring it as a config parameter.
 1.3 25-Jul-2010  pooka make interface/routing configuration a bit more generic
 1.2 18-Jul-2010  pooka fix routine name in error message
 1.1 04-Jul-2010  pooka Add test case for PR kern/43548

Due to the nature of the feature under test, this one is a little
different, so let me explain how it works.

The test program forks and bootstraps a rump kernel in both processes.
It then configures shared memory interfaces in both. shmif is nice
in that it uses a mmaped file as the bus and does not require root
privileges for communication between two (or more) processes. The
child process then proceeds to increase icmp.returndatabytes as
indicated by the PR, while the parent process sets the global TTL
of the rump kernel to 1 (note: both values only affect the respective
rump kernels, not each other or more importantly the host kernel).
The parent then sends the bad packet which is supposed to be routed
by the child. If ip_icmp.c was too old, *boom* + fail; otherwise
nothing bad happens and the test exists with success after one
second.

Eventually this test can be extended into a framework for automated
testing of any networking code which requires (arbitrarily complex)
routing setups.
 1.7.6.1 17-Apr-2012  yamt sync with head
 1.9.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.8 26-May-2017  ozaki-r Change the default value of DEBUG of stable tests to false
 1.7 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.6 24-Nov-2016  ozaki-r Move get_lladdr to net_common.sh
 1.5 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.4 20-Oct-2016  ozaki-r Make test names self-descriptive
 1.3 02-Oct-2016  kre More adaptation to changed ifconfig output format.
 1.2 10-Aug-2016  kre + -lrumpdev
 1.1 14-Sep-2015  ozaki-r branches: 1.1.2;
Add tests for IPv6 ICMP redirect

Note that tests for redirect timeout doesn't work for now due to
PR kern/50240.

From s-yamaguchi@IIJ (with some fixes and tweaks by ozaki-r)
 1.1.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.6 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.5 24-Nov-2016  ozaki-r Move route check functions to net_common.sh
 1.4 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.3 10-Aug-2016  kre + -lrumpdev
 1.2 25-Dec-2015  ozaki-r branches: 1.2.2;
Add some tests for sysctl net.inet.ip.*

- net.inet.ip.redirect
- net.inet.ip.directed-broadcast (and net.inet.icmp.bmcastecho)
- net.inet.ip.ttl

From suzu-ken@IIJ (with tweaks by me)
 1.1 31-Aug-2015  ozaki-r Add tests for ICMP redirect timeout
 1.2.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.24 11-Jun-2019  gson In the "got %d/%d" message printed at the end of the pingsize test,
make the latter number show the actual number of ICMP packets the test
attempted to send. Thus, the two numbers can now be meaningfully
compared, and their difference indicates the number of packets lost.
 1.23 26-Mar-2018  roy branches: 1.23.2;
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.22 24-Mar-2018  roy Allow a valid sendto .... duh
 1.21 24-Mar-2018  kamil Fix a printf(3)-like format in ATF ICMP t_ping.c

Use %zd for ssize_t, instead of %d.
 1.20 23-Mar-2018  roy Note value received. Harden another sendto for ENOBUFS.
 1.19 22-Mar-2018  roy Handle ENOBUFS in sendto
 1.18 22-Mar-2018  roy Handle ENOBUFS in recv
 1.17 13-Jan-2017  christos branches: 1.17.6; 1.17.12;
Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.16 26-Feb-2015  martin branches: 1.16.2;
Bump timeout to 20 seconds for slower machines.
 1.15 04-Sep-2012  alnsn Replace usleep(500000) with a synchronization over a pipe.
 1.14 26-Jun-2011  christos branches: 1.14.2;
fix fallout from including signal.h from rump_syscallargs.h
 1.13 05-Jan-2011  martin Use raw buffer size (not aligned value) to limit packet size
 1.12 05-Jan-2011  martin Fix alignment of sndbuf (sparc64 got a SIGBUS in this test)
 1.11 07-Nov-2010  jmmv Adjusts tests after import of atf-0.12:

- The use.fs property is gone.
- Mark the tests/fs/t_create:attrs test as broken when using the default
unprivileged-user:_atf setting. This probably deserves a fix somehow
but I'm not sure at this point.
 1.10 03-Nov-2010  christos make that u_int, because it is passed as a socket option.
 1.9 03-Nov-2010  christos add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.8 26-Aug-2010  pooka setsockopt() wants int instead of size_t. Should fix this on LP64.
 1.7 23-Aug-2010  pooka Add a delay between startup of pinger and pingee here too.

XXX: there's apparently some race condition which appears to trigger
if a broadcast arp arrives around the same time as the arpwhohas
is sent. This causes original packet to never be sent by the
arpwhohas requestor. If this rings a bell to someone, please let
me know.
 1.6 18-Aug-2010  pooka .. put a timeout here just in case the receive does not increase
the counter.
 1.5 18-Aug-2010  pooka Add a test for the "ping of death". Declare the test a success
when the receiver increases the "ip toolong" stat counter.
 1.4 18-Aug-2010  pooka send pings in ascending order
 1.3 18-Aug-2010  pooka Add a two-way floodping test and a test which sends icmp echos with
various sizes.
 1.2 17-Aug-2010  pooka add a test which floodpings another host
 1.1 09-Aug-2010  pooka test that kernel reponds to ping
 1.14.2.1 30-Oct-2012  yamt sync with head
 1.16.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.17.12.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.17.6.2 10-Apr-2018  martin Additionally pull up the following revision for ticket #724:

tests/net/icmp/t_ping.c 1.21

Fix a printf(3)-like format in ATF ICMP t_ping.c
 1.17.6.1 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.23.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.6 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.5 10-Aug-2016  kre branches: 1.5.14;

+ -lrumpdev
 1.4 30-Dec-2010  pooka Substitute a surgical rump_server configuration for rump_allserver
now that it's possible. With warm fs cache, the startup time of
the former is 0.01s and the latter 0.1s. With cold caches it's
0.2s vs 2s.
 1.3 18-Dec-2010  pooka atf-check => atf_check
 1.2 14-Dec-2010  pooka use ping -n, since technically we don't have dns
 1.1 14-Dec-2010  pooka Add another version of the simple ping test, this time written as a
shell script and using rump_server, rump.ifconfig and rump.ping.

XXX: uses rump_allserver for now, though, since i noticed a problem
where the rump kernel syscall vector does not get updated for
dlopen()'d libraries (and hence if you dlopen librumpnet.so, socket()
still gives ENOSYS). Me be fixink it later.
 1.5.14.1 10-Jun-2019  christos Sync with HEAD
 1.10 01-Mar-2020  christos Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.9 01-Jun-2019  kre Deal with fallout from the addition of
KERN_PROC_CWD in sysctl(3)
That is kern.proc.$$.KERN_PROC_CWD (I think - not that it matters here)

The effect is that -lrump now requires -lrumpvfs

This set of changes fixes (I believe) regular dynamic builds,
more might be required for static builds (will be verified soon).
 1.8 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.7 14-Feb-2017  ozaki-r branches: 1.7.12;
Add tests for ifconfig up/down
 1.6 08-Aug-2016  pgoyette branches: 1.6.2;
This one needs librump dev (and librumpvfs) too
 1.5 01-Jul-2015  ozaki-r branches: 1.5.2;
Add tests of interface creation/destruction
 1.4 08-Dec-2014  ozaki-r Fix LDADD.t_compat

This unbreaks the build.
 1.3 08-Dec-2014  ozaki-r Add basic tests for ifconf (SIOCGIFCONF)
 1.2 10-Jun-2014  he Fix static linking for the tests: -lrump is also used by -lrumpuser,
so we also need -lrump after -lrumpuser. Fixes build for sun2.
 1.1 07-Nov-2010  pooka branches: 1.1.12; 1.1.22;
convert program in PR kern/44054 to an atf test case
 1.1.22.1 10-Aug-2014  tls Rebase.
 1.1.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.6.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.7.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.7.12.1 10-Jun-2019  christos Sync with HEAD
 1.1 08-Dec-2014  ozaki-r Add basic tests for ifconf (SIOCGIFCONF)
 1.5 26-Nov-2021  gson Delete trailing whitespace
 1.4 12-Nov-2016  kre Delete inappropriate \n from atd_tc_expect_fail() message
 1.3 07-Nov-2016  pgoyette Add PR number to the expected-fail message
 1.2 07-Nov-2016  pgoyette Mark this test as "expected failure" since rump doesn't include the
COMPAT_43 code.
 1.1 07-Nov-2010  pooka branches: 1.1.28;
convert program in PR kern/44054 to an atf test case
 1.1.28.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.4 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.3 10-Aug-2016  kre branches: 1.3.14;

+ -lrumpdev
 1.2 28-Apr-2016  ozaki-r Don't depend on the order of interfaces

Instead add tests of querying varying number of interfaces
and tests of checking if removing interfaces is reflected.
 1.1 08-Dec-2014  ozaki-r Add basic tests for ifconf (SIOCGIFCONF)
 1.3.14.1 10-Jun-2019  christos Sync with HEAD
 1.22 17-Aug-2021  andvar fix multiplei repetitive typos in comments, messages and documentation. mainly because copy paste code big amount of files are affected.
 1.21 15-Aug-2019  ozaki-r tests: check if ifconfig (ioctl) works after a failure of ifconfig destroy

This is a test for PR kern/54434.
 1.20 04-Jul-2019  ozaki-r branches: 1.20.2;
Add ATF test for a description.

From t-kusaba@IIJ
 1.19 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.18 16-Mar-2017  ozaki-r branches: 1.18.4; 1.18.12;
Add a test case of ifconfig <if_index>

I don't know if <if_index> is expected to be accepted instead of
an interface name. Please update the test case if the behavior is
just a bug and ifconfig gets fixed.
 1.17 17-Feb-2017  ozaki-r Make the test more stable
 1.16 14-Feb-2017  ozaki-r Add tests for ifconfig up/down
 1.15 20-Jan-2017  ozaki-r Protect if_clone data with if_clone_mtx

To this end, carpattach needs to be delayed from RUMP_COMPONENT_NET to
RUMP_COMPONENT_NET_IF on rump_server. Otherwise mutex_enter via carpattach
for if_clone_mtx is called before mutex_init for it in ifinit1.
 1.14 01-Oct-2016  kre branches: 1.14.2;

Don't expect ping to complain about sending to a local address
assigned to an interface that's down - instead it just attempts
to send, and the interface never responds (as it would if it were
a remote address).
 1.13 01-Oct-2016  roy Adjust tests to new output. Wait for DaD to finish before pinging.
 1.12 14-Sep-2016  christos Ignore case in deprecated/anycast
 1.11 10-Aug-2016  kre + -lrumpdev
 1.10 21-Jun-2016  ozaki-r branches: 1.10.2;
Make a bunch of test names self-descriptive
 1.9 28-Apr-2016  ozaki-r Don't depend on the order of interfaces

The kernel guarantees nothing about it.
 1.8 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.7 29-Feb-2016  ozaki-r Add tests deleting active/inactive links
 1.6 20-Nov-2015  ozaki-r Set timeout of ping to reduce execution time
 1.5 06-Nov-2015  ozaki-r Improve test stability

"deprecated" flag may not be reflected immediately. We need to add some
delay before checking the result.
 1.4 05-Nov-2015  ozaki-r Add tests for ifconfig parameters

From s-yamaguchi@IIJ
 1.3 15-Sep-2015  ozaki-r Improve test stability

ifconfig -a -v after ifconfig -a -z is expected to show '0 packets' for
all interface. However, shmif0 can receive packets between the two
operations. So we have to keep shmif0 down during such tests.
 1.2 03-Sep-2015  ozaki-r Add tests for ifconfig options

From s-yamaguchi@IIJ (with some tweaks by me)
 1.1 01-Jul-2015  ozaki-r Add tests of interface creation/destruction
 1.10.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.10.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.14.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.18.12.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.12.1 10-Jun-2019  christos Sync with HEAD
 1.18.4.1 19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1339):

sys/net/if.c: revision 1.458
tests/net/if/t_ifconfig.sh: revision 1.21

Restore if_ioctl on error of ifc_destroy

Otherwise subsequence ioctls won't work.

Patch from Harold Gutch on PR kern/54434 (tweaked a bit by me)
tests: check if ifconfig (ioctl) works after a failure of ifconfig destroy

This is a test for PR kern/54434.
 1.20.2.1 19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #98):

sys/net/if.c: revision 1.458
tests/net/if/t_ifconfig.sh: revision 1.21

Restore if_ioctl on error of ifc_destroy

Otherwise subsequence ioctls won't work.

Patch from Harold Gutch on PR kern/54434 (tweaked a bit by me)
tests: check if ifconfig (ioctl) works after a failure of ifconfig destroy

This is a test for PR kern/54434.
 1.4 03-Sep-2024  ozaki-r tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.
 1.3 11-Mar-2017  ozaki-r branches: 1.3.22; 1.3.24;
Separate tests for learning table of bridge
 1.2 24-Nov-2016  ozaki-r branches: 1.2.2;
Move get_macaddr to net_common.sh
 1.1 18-Sep-2014  ozaki-r branches: 1.1.2;
Add net/if_bridge test
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.3.24.1 02-Aug-2025  perseant Sync with HEAD
 1.3.22.1 05-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #818):

sys/net/if_bridgevar.h: revision 1.39
sbin/brconfig/brconfig.c: revision 1.18
tests/net/if_bridge/unicast.pcap.uue: revision 1.1
tests/net/if_bridge/t_bridge.sh: revision 1.20
sbin/brconfig/brconfig.8: revision 1.21
tests/net/if_bridge/t_bridge.sh: revision 1.21
sys/net/if_bridge.c: revision 1.194
tests/net/if_bridge/Makefile: revision 1.4
distrib/sets/lists/tests/mi: revision 1.1336
tests/net/if_bridge/broadcast.pcap.uue: revision 1.1

bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.

tests: dedup test scripts like others

brconfig: add protect/-protect commands

It marks/clears a specified interface "protected".
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.

distrib: install uuencoded pcap files for testing
 1.1 03-Sep-2024  ozaki-r branches: 1.1.2; 1.1.6;
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.
 1.1.6.2 02-Aug-2025  perseant Sync with HEAD
 1.1.6.1 03-Sep-2024  perseant file broadcast.pcap.uue was added on branch perseant-exfatfs on 2025-08-02 05:58:10 +0000
 1.1.2.2 05-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #818):

sys/net/if_bridgevar.h: revision 1.39
sbin/brconfig/brconfig.c: revision 1.18
tests/net/if_bridge/unicast.pcap.uue: revision 1.1
tests/net/if_bridge/t_bridge.sh: revision 1.20
sbin/brconfig/brconfig.8: revision 1.21
tests/net/if_bridge/t_bridge.sh: revision 1.21
sys/net/if_bridge.c: revision 1.194
tests/net/if_bridge/Makefile: revision 1.4
distrib/sets/lists/tests/mi: revision 1.1336
tests/net/if_bridge/broadcast.pcap.uue: revision 1.1

bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.

tests: dedup test scripts like others

brconfig: add protect/-protect commands

It marks/clears a specified interface "protected".
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.

distrib: install uuencoded pcap files for testing
 1.1.2.1 03-Sep-2024  martin file broadcast.pcap.uue was added on branch netbsd-10 on 2024-09-05 09:27:13 +0000
 1.21 03-Sep-2024  ozaki-r tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.
 1.20 03-Sep-2024  ozaki-r tests: dedup test scripts like others
 1.19 19-Aug-2019  ozaki-r branches: 1.19.8; 1.19.10;
tests: use rump_server_add_iface to create interfaces
 1.18 01-Feb-2018  ozaki-r branches: 1.18.4;
Commonalize and add tests of creating/destroying interfaces
 1.17 11-Mar-2017  ozaki-r branches: 1.17.4;
Separate tests for learning table of bridge
 1.16 25-Nov-2016  ozaki-r branches: 1.16.2;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.15 25-Nov-2016  ozaki-r Add $DEBUG and remove a unused function
 1.14 24-Nov-2016  ozaki-r Move get_macaddr to net_common.sh
 1.13 10-Aug-2016  kre + -lrumpdev (and avoid doing stuff twice).
 1.12 21-Jun-2016  ozaki-r branches: 1.12.2;
Make a bunch of test names self-descriptive
 1.11 07-Aug-2015  ozaki-r Use rump.ping6 instead of ping6 with rumphijack(3)
 1.10 23-Jul-2015  ozaki-r Reflect a fix for bridge

Due to PR 48104, some tests of ping/ping6 were failed but the tests now
should be successful. So reverse atf_check.

Bonus: the fix for PR 48104 also fixes another uknown failure.
 1.9 10-Jun-2015  ozaki-r Add missing cleanup
 1.8 09-Jun-2015  ozaki-r Add tests for bridge members with an IP address

The tests include checks for PR#48104 which is not fixed yet.

Note that one test unexpectedly fails for some reason
(unrelated to PR#48104). We have to fix it somehow.
 1.7 29-May-2015  ozaki-r Disable test_ping_failure which is conducted before setup_bridge

It randomly fails (esp, often on a slow or loaded machine) due to
PR kern/49219, so disable it for now.

I forgot why I didn't include the test when I committed the test
at first and wrongly added it at v1.4.
 1.6 29-May-2015  ozaki-r Bump timeout for ping and ping6 to 5 sec

Hope the wait is enough for slow machines, e.g., qemu/anita/i386.
 1.5 29-May-2015  ozaki-r Get rid of unnecessary shebang

It will be added when it's built.
 1.4 16-May-2015  ozaki-r Enable IPv6 negative tests

As ping6 timeout feature (-X option) is added, we can do negative
tests without wasting time.

1 sec delay is added after network setup to avoid false positives.
 1.3 08-Jan-2015  ozaki-r Add tests for brconfig maxaddr
 1.2 07-Jan-2015  ozaki-r Add some tests for rtable operations of if_bridge
 1.1 18-Sep-2014  ozaki-r Add net/if_bridge test
 1.12.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.12.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.16.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.17.4.1 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.18.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.19.10.1 02-Aug-2025  perseant Sync with HEAD
 1.19.8.1 05-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #818):

sys/net/if_bridgevar.h: revision 1.39
sbin/brconfig/brconfig.c: revision 1.18
tests/net/if_bridge/unicast.pcap.uue: revision 1.1
tests/net/if_bridge/t_bridge.sh: revision 1.20
sbin/brconfig/brconfig.8: revision 1.21
tests/net/if_bridge/t_bridge.sh: revision 1.21
sys/net/if_bridge.c: revision 1.194
tests/net/if_bridge/Makefile: revision 1.4
distrib/sets/lists/tests/mi: revision 1.1336
tests/net/if_bridge/broadcast.pcap.uue: revision 1.1

bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.

tests: dedup test scripts like others

brconfig: add protect/-protect commands

It marks/clears a specified interface "protected".
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.

distrib: install uuencoded pcap files for testing
 1.8 26-Mar-2023  andvar fix various typos in documentation, comments and sysctl device description.
mainly aion -> ation and inlude -> include.
 1.7 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.6 09-Jul-2019  ozaki-r tests: check that a new cache is not created over the limit
 1.5 31-May-2019  gson Increase the timeout for the manyaddrs test case; the default is often
insufficient under qemu.
 1.4 09-Nov-2018  ozaki-r Add a test to show a large number of MAC addresses cached in a bridge
 1.3 18-Apr-2018  ozaki-r branches: 1.3.2;
Add a test that checks if brconfig flush surely removes all entries
 1.2 10-Apr-2018  ozaki-r Add a test case for bridge_rtdelete
 1.1 11-Mar-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.8; 1.1.14;
Separate tests for learning table of bridge
 1.1.14.3 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.1.14.2 22-Apr-2018  pgoyette Sync with HEAD
 1.1.14.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.1.8.2 18-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #777):

tests/net/if_bridge/t_rtable.sh: revision 1.3
sys/net/if_bridge.c: revision 1.150-1.154
sys/net/if_bridgevar.h: revision 1.32

Remove obsolete NULL checks

Simplify bridge_rtnode_insert (NFC)

bridge: use pslist(9) for rtlist and rthash

The change fixes race conditions on list operations. One example is that a
reader may see invalid pointers on a looking item in a list due to lack of
membar_producer.

Add a test that checks if brconfig flush surely removes all entries

Get rid of a unnecessary semicolon
Pointed out by kamil@

Add missing PSLIST_ENTRY_INIT and PSLIST_ENTRY_DESTROY
 1.1.8.1 10-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #727):

tests/net/if_bridge/t_rtable.sh: revision 1.2
sys/net/if_bridge.c: revision 1.149

Fix bridge_rtdelete

It removes a rtable entry that belongs to a specified interface, however,
its original behavior was to delete all belonging entries.
Restore the original behavior.

Add a test case for bridge_rtdelete
 1.1.4.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.4.1 11-Mar-2017  bouyer file t_rtable.sh was added on branch bouyer-socketcan on 2017-04-21 16:54:12 +0000
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 11-Mar-2017  pgoyette file t_rtable.sh was added on branch pgoyette-localcount on 2017-03-20 06:58:01 +0000
 1.3.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.3.2.1 10-Jun-2019  christos Sync with HEAD
 1.1 03-Sep-2024  ozaki-r branches: 1.1.2; 1.1.6;
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.
 1.1.6.2 02-Aug-2025  perseant Sync with HEAD
 1.1.6.1 03-Sep-2024  perseant file unicast.pcap.uue was added on branch perseant-exfatfs on 2025-08-02 05:58:11 +0000
 1.1.2.2 05-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #818):

sys/net/if_bridgevar.h: revision 1.39
sbin/brconfig/brconfig.c: revision 1.18
tests/net/if_bridge/unicast.pcap.uue: revision 1.1
tests/net/if_bridge/t_bridge.sh: revision 1.20
sbin/brconfig/brconfig.8: revision 1.21
tests/net/if_bridge/t_bridge.sh: revision 1.21
sys/net/if_bridge.c: revision 1.194
tests/net/if_bridge/Makefile: revision 1.4
distrib/sets/lists/tests/mi: revision 1.1336
tests/net/if_bridge/broadcast.pcap.uue: revision 1.1

bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.

tests: dedup test scripts like others

brconfig: add protect/-protect commands

It marks/clears a specified interface "protected".
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.

distrib: install uuencoded pcap files for testing
 1.1.2.1 03-Sep-2024  martin file unicast.pcap.uue was added on branch netbsd-10 on 2024-09-05 09:27:12 +0000
 1.3 25-Nov-2022  knakahara Add ATF for unnumbered interfaces.
 1.2 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.1 05-Nov-2015  knakahara branches: 1.1.2;
add basic if_gif tests to ATF.
 1.1.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.13 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.12 01-Feb-2018  ozaki-r branches: 1.12.4;
Commonalize and add tests of creating/destroying interfaces
 1.11 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.10 26-May-2017  ozaki-r branches: 1.10.2;
Change the default value of DEBUG of stable tests to false
 1.9 21-Dec-2016  ozaki-r Enable DEBUG to see what happened on babylon5
 1.8 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.7 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.6 18-Oct-2016  ozaki-r Make test names self-descriptive
 1.5 10-Aug-2016  ozaki-r Add rumpdev library for config_cfdriver_attach
 1.4 16-Dec-2015  knakahara branches: 1.4.2;
Refactor. No functional change.
 1.3 08-Dec-2015  knakahara add gif test for set_tunnel/delete_tunnel and recursion calls check
 1.2 07-Dec-2015  knakahara remove extra shebang and fix a potentially bug
 1.1 05-Nov-2015  knakahara add basic if_gif tests to ATF.
 1.4.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.4.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.10.2.2 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.10.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.12.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1 25-Nov-2022  knakahara Add ATF for unnumbered interfaces.
 1.4 25-Nov-2022  knakahara Add ATF for unnumbered interfaces.
 1.3 17-Jan-2019  knakahara Add ATF for ipsecif(4) pfil.
 1.2 25-Dec-2018  knakahara Add ATF for NAT-T enabled ipsecif(4).
 1.1 10-Jan-2018  knakahara branches: 1.1.2; 1.1.4; 1.1.6;
add ipsec(4) interface ATF.
 1.1.6.1 10-Jun-2019  christos Sync with HEAD
 1.1.4.2 18-Jan-2019  pgoyette Synch with HEAD
 1.1.4.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.1.2.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.1.2.1 10-Jan-2018  snj file Makefile was added on branch netbsd-8 on 2018-02-11 21:17:35 +0000
 1.11 05-Aug-2020  knakahara Fix missing "-m tranport" options. Pointed out by k-goda@IIJ.

Using any mode SA causes unepected call path, that is,
ipsec4_common_input_cb() calls ip_input() directly instead of
ipsecif4_input().
 1.10 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.9 15-Jan-2019  knakahara branches: 1.9.2;
Fix PR kern/53848. Add missing "ifconfig -w".
 1.8 15-Jan-2019  knakahara revert t_ipsec.sh:r1.7
 1.7 11-Jan-2019  knakahara workaround for PR kern/53848
 1.6 10-Jan-2019  knakahara tests/net/if_ipsec/t_ipsec disable dad. This may fix PR kern/53848
 1.5 25-Dec-2018  knakahara reduce debug messages when $DEBUG is not true.
 1.4 13-Mar-2018  knakahara branches: 1.4.2;
Enhance assertion ipsecif(4) ATF to avoid confusing setkey(8) error message.

When setkey(8) says "syntax error at [-E]", it must mean get_if_ipsec_unique()
failed.
 1.3 01-Feb-2018  ozaki-r branches: 1.3.2; 1.3.4;
Commonalize and add tests of creating/destroying interfaces
 1.2 11-Jan-2018  ozaki-r Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
 1.1 10-Jan-2018  knakahara add ipsec(4) interface ATF.
 1.3.4.3 18-Jan-2019  pgoyette Synch with HEAD
 1.3.4.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.3.4.1 15-Mar-2018  pgoyette Synch with HEAD
 1.3.2.4 13-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #627):
sys/netipsec/ipsecif.c: revision 1.5
tests/net/if_ipsec/t_ipsec.sh: revision 1.4
sys/net/if_ipsec.c: revision 1.7
Fix IPv6 ipsecif(4) ATF regression, sorry.
There must *not* be padding between the src sockaddr and the dst sockaddr
after struct sadb_x_policy.

Comment out confusing (and incorrect) code and add comment. Pointed out by maxv@n.o, thanks.

Enhance assertion ipsecif(4) ATF to avoid confusing setkey(8) error message.

When setkey(8) says "syntax error at [-E]", it must mean get_if_ipsec_unique()
failed.
 1.3.2.3 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.3.2.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.3.2.1 01-Feb-2018  snj file t_ipsec.sh was added on branch netbsd-8 on 2018-02-11 21:17:35 +0000
 1.4.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4.2.1 10-Jun-2019  christos Sync with HEAD
 1.9.2.1 10-Nov-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1129):

tests/net/if_ipsec/t_ipsec_pfil.sh: revision 1.3
tests/net/if_ipsec/t_ipsec.sh: revision 1.11
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/common.sh: revision 1.8

Typo in error message

Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc
ok'ed by ozaki-r@n.o, thanks.

Fix missing "-m tranport" options. Pointed out by k-goda@IIJ.

Using any mode SA causes unepected call path, that is,
ipsec4_common_input_cb() calls ip_input() directly instead of
ipsecif4_input().
 1.5 05-Jun-2020  knakahara Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc

ok'ed by ozaki-r@n.o, thanks.
 1.4 01-Jun-2020  martin Typo in error message
 1.3 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.2 26-Dec-2018  knakahara branches: 1.2.2; 1.2.4; 1.2.6;
Add ATF for ipsecif(4) which connect to two peers in the same NAPT.
 1.1 25-Dec-2018  knakahara Add ATF for NAT-T enabled ipsecif(4).
 1.2.6.1 10-Nov-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1129):

tests/net/if_ipsec/t_ipsec_pfil.sh: revision 1.3
tests/net/if_ipsec/t_ipsec.sh: revision 1.11
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/common.sh: revision 1.8

Typo in error message

Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc
ok'ed by ozaki-r@n.o, thanks.

Fix missing "-m tranport" options. Pointed out by k-goda@IIJ.

Using any mode SA causes unepected call path, that is,
ipsec4_common_input_cb() calls ip_input() directly instead of
ipsecif4_input().
 1.2.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2.4.2 10-Jun-2019  christos Sync with HEAD
 1.2.4.1 26-Dec-2018  christos file t_ipsec_natt.sh was added on branch phil-wifi on 2019-06-10 22:10:09 +0000
 1.2.2.3 18-Jan-2019  pgoyette Synch with HEAD
 1.2.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.2.2.1 26-Dec-2018  pgoyette file t_ipsec_natt.sh was added on branch pgoyette-compat on 2018-12-26 14:02:10 +0000
 1.3 05-Aug-2020  knakahara Fix missing "-m tranport" options. Pointed out by k-goda@IIJ.

Using any mode SA causes unepected call path, that is,
ipsec4_common_input_cb() calls ip_input() directly instead of
ipsecif4_input().
 1.2 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.1 17-Jan-2019  knakahara branches: 1.1.2; 1.1.4; 1.1.6;
Add ATF for ipsecif(4) pfil.
 1.1.6.1 10-Nov-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1129):

tests/net/if_ipsec/t_ipsec_pfil.sh: revision 1.3
tests/net/if_ipsec/t_ipsec.sh: revision 1.11
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/common.sh: revision 1.8

Typo in error message

Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc
ok'ed by ozaki-r@n.o, thanks.

Fix missing "-m tranport" options. Pointed out by k-goda@IIJ.

Using any mode SA causes unepected call path, that is,
ipsec4_common_input_cb() calls ip_input() directly instead of
ipsecif4_input().
 1.1.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.4.2 10-Jun-2019  christos Sync with HEAD
 1.1.4.1 17-Jan-2019  christos file t_ipsec_pfil.sh was added on branch phil-wifi on 2019-06-10 22:10:09 +0000
 1.1.2.2 18-Jan-2019  pgoyette Synch with HEAD
 1.1.2.1 17-Jan-2019  pgoyette file t_ipsec_pfil.sh was added on branch pgoyette-compat on 2019-01-18 08:51:00 +0000
 1.2 27-Sep-2023  knakahara Update for sys/net/if_ipsec.c:r1.35
 1.1 25-Nov-2022  knakahara branches: 1.1.2;
Add ATF for unnumbered interfaces.
 1.1.2.1 02-Oct-2023  martin Pull up following revision(s) (requested by knakahara in ticket #378):

tests/net/if_ipsec/t_ipsec_unnumbered.sh: revision 1.2
sys/net/if_ipsec.c: revision 1.35
sys/netipsec/key.c: revision 1.281

Use kmem_free instead of kmem_intr_free, as key_freesaval() is not called in softint after key.c:r1.223.
E.g. key_freesaval() was called the following call path before SAD MP-ify.
esp_input_cb()
KEY_FREESAV()
key_freesav()
key_delsav()
key_freesaval()
ok'ed by ozaki-r@n.o.

Use unit id instead of if_index to reduce fixed_reqid space.

Update for sys/net/if_ipsec.c:r1.35
 1.1 16-Feb-2017  knakahara branches: 1.1.2; 1.1.4;
add l2tp(4) basic test.
 1.1.4.2 16-Feb-2017  knakahara 79023
 1.1.4.1 16-Feb-2017  knakahara file Makefile was added on branch bouyer-socketcan on 2017-02-16 08:44:48 +0000
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 16-Feb-2017  pgoyette file Makefile was added on branch pgoyette-localcount on 2017-03-20 06:58:01 +0000
 1.5 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.4 01-Feb-2018  ozaki-r branches: 1.4.4;
Commonalize and add tests of creating/destroying interfaces
 1.3 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.2 17-Feb-2017  ozaki-r branches: 1.2.2; 1.2.4; 1.2.8;
Make test names self-descriptive
 1.1 16-Feb-2017  knakahara add l2tp(4) basic test.
 1.2.8.2 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.2.8.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.2.4.2 17-Feb-2017  ozaki-r 1922358
 1.2.4.1 17-Feb-2017  ozaki-r file t_l2tp.sh was added on branch bouyer-socketcan on 2017-02-17 00:51:26 +0000
 1.2.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.2.2.1 17-Feb-2017  pgoyette file t_l2tp.sh was added on branch pgoyette-localcount on 2017-03-20 06:58:01 +0000
 1.4.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1 17-May-2021  yamaguchi branches: 1.1.2;
Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.1.2.2 31-May-2021  cjep sync with head
 1.1.2.1 17-May-2021  cjep file Makefile was added on branch cjep_staticlib_x on 2021-05-31 22:15:23 +0000
 1.11 05-Apr-2024  yamaguchi lagg(4) test: Fix typo and old comment
 1.10 18-Oct-2023  yamaguchi Update the test case for MTU of lag to adapt new behavior
 1.9 16-Oct-2023  yamaguchi Make the lagg interface up before change its MTU

This change is related to PR kern/57650
 1.8 31-Mar-2022  yamaguchi branches: 1.8.2;
Add tests for MTU of lagg(4)
 1.7 31-Mar-2022  yamaguchi Added waiting for distributing state after attaching vlan

A lagg interface is reset on attaching vlan to enable
ETHERCAP_VLAN_MTU if the lagg I/F has it. Therefore,
it is necessary to wait for distributing.
 1.6 08-Nov-2021  yamaguchi Added tests for lagg(4) about MAC addresses
 1.5 02-Nov-2021  yamaguchi Added tests of combination of lagg(4), vlan(4), and l2tp(4)
 1.4 02-Nov-2021  yamaguchi Use IPv6 addresses, not IPv4, in combination test of IPv6, lagg and vlan
 1.3 19-Oct-2021  yamaguchi added test cases for lagg(4) on l2tp(4)
 1.2 25-May-2021  yamaguchi branches: 1.2.2;
Added missing cleanup option

Fixes PR/56206
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.2.2.2 31-May-2021  cjep sync with head
 1.2.2.1 25-May-2021  cjep file t_lagg.sh was added on branch cjep_staticlib_x on 2021-05-31 22:15:23 +0000
 1.8.2.2 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #916):

sys/net/lagg/if_laggproto.c: revision 1.15
sys/net/lagg/if_lagg_lacp.c: revision 1.36
sys/net/lagg/if_laggproto.c: revision 1.16
sys/net/lagg/if_lagg_lacp.c: revision 1.37
sys/net/lagg/if_lagg_lacp.c: revision 1.38
sys/net/lagg/if_lagg_lacp.c: revision 1.39
sys/net/lagg/if_lagg.c: revision 1.54
sys/net/lagg/if_lagg.c: revision 1.55
sys/net/lagg/if_lagg.c: revision 1.59
sys/net/lagg/if_lagg.c: revision 1.70
sys/net/lagg/if_laggproto.h: revision 1.19
sys/net/lagg/if_lagg_lacp.c: revision 1.28
sys/net/lagg/if_lagg_lacp.c: revision 1.29
sys/net/lagg/if_laggproto.c: revision 1.7
sys/net/lagg/if_lagg_lacp.h: revision 1.5
sys/net/lagg/if_laggproto.c: revision 1.8
sys/net/lagg/if_laggproto.c: revision 1.9
sys/net/lagg/if_lagg_lacp.c: revision 1.40
sys/net/lagg/if_lagg_lacp.c: revision 1.41
sys/net/lagg/if_lagg_lacp.c: revision 1.42
sys/net/lagg/if_lagg_lacp.c: revision 1.43
tests/net/if_lagg/t_lagg.sh: revision 1.11
sys/net/lagg/if_lagg.c: revision 1.60
sys/net/lagg/if_lagg.c: revision 1.62
sys/net/lagg/if_lagg.c: revision 1.63
sys/net/lagg/if_lagg.c: revision 1.64
sys/net/lagg/if_laggproto.h: revision 1.20
sys/net/lagg/if_lagg.c: revision 1.65
sys/net/lagg/if_lagg.c: revision 1.66
sys/net/lagg/if_lagg.c: revision 1.67
sys/net/lagg/if_lagg_lacp.c: revision 1.30
sys/net/lagg/if_lagg.c: revision 1.68
sys/net/lagg/if_laggproto.c: revision 1.10
sys/net/lagg/if_lagg_lacp.c: revision 1.31
sys/net/lagg/if_lagg.c: revision 1.69
sys/net/lagg/if_laggproto.c: revision 1.11
sys/net/lagg/if_lagg_lacp.c: revision 1.32
sys/net/lagg/if_laggproto.c: revision 1.12
sys/net/lagg/if_lagg_lacp.c: revision 1.33
sys/net/lagg/if_laggproto.c: revision 1.13
sys/net/lagg/if_lagg_lacp.c: revision 1.34
sys/net/lagg/if_laggproto.c: revision 1.14
sys/net/lagg/if_lagg_lacp.c: revision 1.35

Set the fastest linkspeed in each physical interface to lagg(4)

lagg(4): Added logs about LACP processing

lagg(4): Fix missing IFNET_LOCK acquirement

lagg(4): update link speed when a physical interface is removed

lagg(4): fix missing update of the number of active ports

lagg(4): Added 0 length check

lagg(4): Added LACP_READY state for logging
when a port turns SELECTED or UNSELECTED

lagg(4): added log on detaching a port from SELECTED state to STANDBY
acquire LAGG_PROTO_LOCK instead of pserialize read section

lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding
lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.

But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
added missing LAGG_UNLOCK()

lagg(4): move comment about IFF_PROMISC
pointed out by ozaki-r@, thanks.

lagg(4): added NULL check for pfil_run_hooks
pointed out by ozaki-r@, thanks.

lagg(4): change errno
suggested by ozaki-r@, thanks.

lagg(4): increase output packets and bytes only if no error occurred
pointed out by ozaki-r@, thanks.

lagg(4): replace NULL check with KASSERT because lp_softc is always non-NULL

lagg(4): Use CTASSERT
Added KASSERT for LACP_LOCK

lagg(4): move allocate memory before ioctl
Added comments to lagg(4)

lagg(4): added __predict_true

lagg(4): added missing pserialize_read_enter
fix missing LACP_LOCK

lagg(4): added check of LACP running state for safety

When LACP stops, the handler of callout do nothing
because all port is already detached from lacp.

Therefore, the added checks are just for safety.
added missing workq_wait for lacp_tick_work()

lagg(4): set suppress at the same time with distribution state

lagg(4): remove unnecessary masking
pointed out by ozaki-r@, thanks.

lagg(4): move reply limitation to recive processing

lagg(4): release lock before pserialize_perform() if possible

lagg(4): Added vlan check

lagg(4): Fix missing destroy for list and entry

lagg(4) test: Fix typo and old comment

lagg: fill name of workqueue correctly
Found by KASSERT failure for DIAGNOSTIC kernel.
Authored by ozaki-r@.
 1.8.2.1 19-Oct-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #429):

sys/net/lagg/if_lagg.c: revision 1.50
sys/net/lagg/if_lagg.c: revision 1.51
tests/net/if_lagg/t_lagg.sh: revision 1.10
sys/net/lagg/if_lagg.c: revision 1.49
tests/net/if_lagg/t_lagg.sh: revision 1.9
share/man/man4/lagg.4: revision 1.5

lagg(4): release LAGG_LOCK before mtu changing
PR kern/57650

Make the lagg interface up before change its MTU
This change is related to PR kern/57650

Fix missing IFNET_LOCK holding while destroy the lagg interface
copy MTU of lagg to a interface added to lagg
even if the interface is the first member of the lagg

This change breaks ATF test case for lagg MTU

Update the test case for MTU of lag to adapt new behavior

Update lagg(4) manual
1. corrected the wrong example
- lagg(4) can not add multiple port and set its priority at once
- This is the restriction of ifconfig(8)
2. adapted to changed behavior related to MTU
- Changed not to copy MTU of the 1st physical interface
to lagg(4) to prevent locking against myself
 1.7 01-Mar-2020  christos Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.6 01-Jun-2019  kre Deal with fallout from the addition of
KERN_PROC_CWD in sysctl(3)
That is kern.proc.$$.KERN_PROC_CWD (I think - not that it matters here)

The effect is that -lrump now requires -lrumpvfs

This set of changes fixes (I believe) regular dynamic builds,
more might be required for static builds (will be verified soon).
 1.5 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.4 28-Feb-2017  ozaki-r branches: 1.4.12;
Add tests for loopback interface
 1.3 08-Aug-2016  pgoyette branches: 1.3.2;
Need librumpdev and librumpvfs
 1.2 10-Jun-2014  he branches: 1.2.6;
Fix static linking for the tests: -lrump is also used by -lrumpuser,
so we also need -lrump after -lrumpuser. Fixes build for sun2.
 1.1 25-Jul-2010  pooka branches: 1.1.12; 1.1.22;
Add xfail test for kernel diagnostic panic described in PR kern/43664
 1.1.22.1 10-Aug-2014  tls Rebase.
 1.1.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.1 20-Mar-2017  pgoyette Sync with HEAD
 1.3.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.4.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4.12.1 10-Jun-2019  christos Sync with HEAD
 1.2 01-Feb-2018  ozaki-r Commonalize and add tests of creating/destroying interfaces
 1.1 28-Feb-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.8;
Add tests for loopback interface
 1.1.8.1 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.1.4.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.4.1 28-Feb-2017  bouyer file t_basic.sh was added on branch bouyer-socketcan on 2017-04-21 16:54:12 +0000
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 28-Feb-2017  pgoyette file t_basic.sh was added on branch pgoyette-localcount on 2017-03-20 06:58:01 +0000
 1.8 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.7 18-Mar-2012  jruoho branches: 1.7.14;
Move more PR references from comments to ATF's "descr".
 1.6 26-Apr-2011  martin branches: 1.6.4;
Minor simplification
 1.5 10-Apr-2011  martin Repeat the fragmentation on lo0 test, but with checksum on IFF_LOOPBACK
interfaces enabled.
 1.4 09-Apr-2011  martin Remove expected failure, PR has been fixed.
 1.3 03-Nov-2010  christos add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.2 26-Jul-2010  pooka Calculate broadcast IP instead of requiring it as a config parameter.
 1.1 25-Jul-2010  pooka Add xfail test for kernel diagnostic panic described in PR kern/43664
 1.6.4.1 17-Apr-2012  yamt sync with head
 1.7.14.1 20-Mar-2017  pgoyette Sync with HEAD
 1.4 25-Nov-2022  knakahara Add ATF for unnumbered interfaces.
 1.3 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.2 24-Nov-2016  ozaki-r Add missing $NetBSD$ tag
 1.1 15-Apr-2016  ozaki-r branches: 1.1.2;
Add a new test case for PPPoE using PAP

From s-yamaguchi@IIJ (with some tweaks by me)
 1.1.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.33 01-Jun-2021  yamaguchi Update test case.
The state of IPCP and IPv6CP is Closed when those are disabled.
 1.32 13-May-2021  yamaguchi Fix the wrong state check

After disconnection from PPPoE server, the client waits for
reconnection in initial state or reconnects in PADI state.
 1.31 11-May-2021  yamaguchi Add tests for "pppoectl {no}[ipcp|ipv6cp]"
 1.30 11-May-2021  yamaguchi Added missing '$'
 1.29 06-May-2021  yamaguchi branches: 1.29.2;
Added missing waiting for DAD completion
 1.28 23-Apr-2021  yamaguchi Added a test case for MTU of pppoe(4)
 1.27 23-Apr-2021  yamaguchi Make IFF_DEBUG enabled if $DEBUG is true
 1.26 23-Apr-2021  yamaguchi functionalize rump.ifconfig and pppoectl for clearer test code
 1.25 23-Apr-2021  yamaguchi Added test cases for "pppoectl passiveauthproto"
 1.24 25-Nov-2020  yamaguchi Use a state of IPCP and IPv6CP to wait for connection established
 1.23 25-Sep-2020  yamaguchi update test cases for AC-Name and Service-Name
 1.22 25-Sep-2020  yamaguchi Add test cases for AC-Name and Service-Name
 1.21 23-Sep-2020  yamaguchi Add a limit for auth at a test for invalid account
 1.20 23-Sep-2020  yamaguchi Fix typo
 1.19 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.18 01-Feb-2018  ozaki-r branches: 1.18.4;
Commonalize and add tests of creating/destroying interfaces
 1.17 28-Mar-2017  ozaki-r branches: 1.17.4;
Use the utility functions for rump_server
 1.16 14-Dec-2016  knakahara branches: 1.16.2;
add wait_for_disconnected to run_test() as well as run_test6().

Before commited MP-safe patch, IPv4 test can run in time without
wait_for_disconnected. Currently, wait_for_disconnected is required
because of locking overhead.
 1.15 12-Dec-2016  knakahara fix accidentally if_pppoe atf failure depends on cpu workload.

advised by s-yamaguchi@IIJ, thanks.
 1.14 02-Dec-2016  knakahara fix typo. ping6 deadline option is not "-w" but "-X".
 1.13 02-Dec-2016  knakahara fix accidentally if_pppoe atf failure depends on cpu workload.
 1.12 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.11 10-Nov-2016  knakahara fix: extend waittime to avoid unintended fail at high cpu load.
 1.10 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.9 27-Oct-2016  knakahara fix rarely rump.ping6 failures by "UDP connect". and fix typo.
 1.8 26-Oct-2016  knakahara Fix error when wait_for_session_established() is called without argument.

From Shoichi YAMAGUCHI<s-yamaguchi@IIJ>, Thanks.
 1.7 26-Oct-2016  knakahara Add new test cases(PAP and CHAP) for IPv6 PPPoE.

From Shoichi YAMAGUCHI<s-yamaguchi@IIJ>, Thanks.
 1.6 19-Oct-2016  ozaki-r Make sure to run cleanup

Should fix "tests: did not complete" failures.
 1.5 18-Oct-2016  ozaki-r Make test names self-descriptive
 1.4 14-Sep-2016  knakahara Disable rechallenge for chap test case.

NetBSD's PPPoE client doesn't support chap rechallenge yet.

From Shoichi YAMAGUCHI<s-yamaguchi@IIJ>, Thanks.
 1.3 12-Sep-2016  christos add a chap test; need to investigate what's wrong with it...
 1.2 07-Aug-2016  pgoyette Add rumpdev library since we're now calling config_cfdriver_attach()

Should fix the newly-introduced test failure.
 1.1 15-Apr-2016  ozaki-r branches: 1.1.2;
Add a new test case for PPPoE using PAP

From s-yamaguchi@IIJ (with some tweaks by me)
 1.1.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.1.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.16.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.17.4.1 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.18.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.29.2.1 31-May-2021  cjep sync with head
 1.1 25-Nov-2022  knakahara Add ATF for unnumbered interfaces.
 1.1 20-Aug-2024  ozaki-r branches: 1.1.2;
tests: add tests for shmif

The test file is placed under tests/net, not tests/rump/rumpnet,
to leverage utility functions provided for tests in there.
 1.1.2.2 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #811):

tests/net/if_shmif/t_shmif.sh: revision 1.1
sbin/ifconfig/ifconfig.c: revision 1.251
sbin/ifconfig/ifconfig.8: revision 1.130
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.85
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.86
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.87
etc/mtree/NetBSD.dist.tests: revision 1.206
distrib/sets/lists/tests/mi: revision 1.1333
tests/net/if_shmif/Makefile: revision 1.1
tests/net/Makefile: revision 1.42

shmif: change behaviors about link states

- Change the link state to UP on ifconfig linkstr
- This behavior emulates physical devices
- Change the link state to UNKNOWN on ifconfig -linkstr just in case
- Reject sending/receiving packets if the link state is DOWN
- Permit to send/receive packets on UNKNOWN, which is required
to unbreak some ATF tests written in C

shmif: support media

It enables to link-down shmif by ifconfig media none and link-up
again by media auto.

ifconfig: show link state on -v

We could guess it through "media" or "status" output, however, we
sometimes want to know it directly for debugging or testing.

It is shown only if the -v option is specified.
tests: add tests for shmif

The test file is placed under tests/net, not tests/rump/rumpnet,
to leverage utility functions provided for tests in there.
shmem(4): Fix typo in comment: AFT -> ATF.

Also fix grammar (if I understood correctly what this meant: rump
servers written in C, rather than set up via shell scripts around
rump_server invoking ifconfig).

No functional change intended.
 1.1.2.1 20-Aug-2024  martin file Makefile was added on branch netbsd-10 on 2024-08-24 16:42:25 +0000
 1.1 20-Aug-2024  ozaki-r branches: 1.1.2;
tests: add tests for shmif

The test file is placed under tests/net, not tests/rump/rumpnet,
to leverage utility functions provided for tests in there.
 1.1.2.2 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #811):

tests/net/if_shmif/t_shmif.sh: revision 1.1
sbin/ifconfig/ifconfig.c: revision 1.251
sbin/ifconfig/ifconfig.8: revision 1.130
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.85
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.86
sys/rump/net/lib/libshmif/if_shmem.c: revision 1.87
etc/mtree/NetBSD.dist.tests: revision 1.206
distrib/sets/lists/tests/mi: revision 1.1333
tests/net/if_shmif/Makefile: revision 1.1
tests/net/Makefile: revision 1.42

shmif: change behaviors about link states

- Change the link state to UP on ifconfig linkstr
- This behavior emulates physical devices
- Change the link state to UNKNOWN on ifconfig -linkstr just in case
- Reject sending/receiving packets if the link state is DOWN
- Permit to send/receive packets on UNKNOWN, which is required
to unbreak some ATF tests written in C

shmif: support media

It enables to link-down shmif by ifconfig media none and link-up
again by media auto.

ifconfig: show link state on -v

We could guess it through "media" or "status" output, however, we
sometimes want to know it directly for debugging or testing.

It is shown only if the -v option is specified.
tests: add tests for shmif

The test file is placed under tests/net, not tests/rump/rumpnet,
to leverage utility functions provided for tests in there.
shmem(4): Fix typo in comment: AFT -> ATF.

Also fix grammar (if I understood correctly what this meant: rump
servers written in C, rather than set up via shell scripts around
rump_server invoking ifconfig).

No functional change intended.
 1.1.2.1 20-Aug-2024  martin file t_shmif.sh was added on branch netbsd-10 on 2024-08-24 16:42:25 +0000
 1.6 01-Oct-2020  rin Link librumpclient explicitly. Fix sun2, i.e., MKPIC=no build.
 1.5 30-Sep-2020  roy Be like other tests and speciy the binary name we install
 1.4 30-Sep-2020  roy Fix prior
 1.3 30-Sep-2020  roy tap(4): update the test so that we can open the tap to ping across a bridge

ping with tap closed to ensure it fails
ping with tap open to ensure it works
 1.2 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.1 04-Mar-2016  ozaki-r branches: 1.1.2;
Add tests for tap(4)
 1.1.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1 30-Sep-2020  roy tap(4): update the test so that we can open the tap to ping across a bridge

ping with tap closed to ensure it fails
ping with tap open to ensure it works
 1.12 07-Aug-2024  rin t_tap: Fix previous to make `tap_bridged` functional again

We need to ping to IP[46]_TAP from REMOTE, not LOCAL (== server).
Otherwise, this does not make sense as a test for tap-bridge interaction.
 1.11 30-Sep-2020  roy branches: 1.11.6; 1.11.8;
tap(4): update the test so that we can open the tap to ping across a bridge

ping with tap closed to ensure it fails
ping with tap open to ensure it works
 1.10 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.9 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.8 22-Mar-2018  ozaki-r branches: 1.8.2;
Avoid setting IP addresses of the same subnet on different interface

If we do so, there will remain one route that is of a preceding address, but
that behavior is not documented and may be changed in the future. Tests
shouldn't rely on such a unstable behavior.
 1.7 01-Feb-2018  ozaki-r branches: 1.7.2;
Commonalize and add tests of creating/destroying interfaces
 1.6 25-Nov-2016  ozaki-r branches: 1.6.6;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.5 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.4 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.3 10-Aug-2016  kre Move -lrumpdev so it is effective.

This test still fails (as does another using tap interfaces) ...

tc-se:rump.ifconfig: clone_command: Device not configured
tc-se:rump.ifconfig: exec_matches: Device not configured

Something is wrong with rumpnet_tap ...
 1.2 21-Jun-2016  ozaki-r branches: 1.2.2;
Make a bunch of test names self-descriptive
 1.1 04-Mar-2016  ozaki-r Add tests for tap(4)
 1.2.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.6.6.2 02-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #688):

tests/net/ndp/t_ndp.sh: revision 1.31
tests/net/if_tap/t_tap.sh: revision 1.8

Avoid setting IP addresses of the same subnet on different interface

If we do so, there will remain one route that is of a preceding address, but
that behavior is not documented and may be changed in the future. Tests
shouldn't rely on such a unstable behavior.
 1.6.6.1 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.7.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.8.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.8.2.1 10-Jun-2019  christos Sync with HEAD
 1.11.8.1 02-Aug-2025  perseant Sync with HEAD
 1.11.6.1 08-Aug-2024  martin Pull up following revision(s) (requested by rin in ticket #780):

tests/net/if_tap/t_tap.sh: revision 1.12

t_tap: Fix previous to make `tap_bridged` functional again

We need to ping to IP[46]_TAP from REMOTE, not LOCAL (== server).

Otherwise, this does not make sense as a test for tap-bridge interaction.
 1.2 05-Sep-2016  ozaki-r Remove a unexpectedly committed file
 1.1 05-Sep-2016  ozaki-r Add very basic tests for tun devices
 1.2 01-Feb-2018  ozaki-r Commonalize and add tests of creating/destroying interfaces
 1.1 05-Sep-2016  ozaki-r branches: 1.1.2; 1.1.8;
Add very basic tests for tun devices
 1.1.8.1 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.1.2.2 14-Sep-2016  pgoyette Sync with HEAD
 1.1.2.1 05-Sep-2016  pgoyette file Makefile was added on branch pgoyette-localcount on 2016-09-14 03:04:19 +0000
 1.6 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.5 01-Feb-2018  ozaki-r branches: 1.5.4;
Commonalize and add tests of creating/destroying interfaces
 1.4 07-Nov-2016  ozaki-r branches: 1.4.6;
Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.3 01-Oct-2016  kre Compensate for the new world order of ifconfig output format,
and the new default netmask for point to point links.
 1.2 05-Sep-2016  ozaki-r branches: 1.2.2;
Add some tests

We need more realistic tests.
 1.1 05-Sep-2016  ozaki-r Add very basic tests for tun devices
 1.2.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.2.2.2 14-Sep-2016  pgoyette Sync with HEAD
 1.2.2.1 05-Sep-2016  pgoyette file t_tun.sh was added on branch pgoyette-localcount on 2016-09-14 03:04:19 +0000
 1.4.6.1 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.5.4.1 10-Jun-2019  christos Sync with HEAD
 1.1 29-Sep-2020  roy vether(4): Add ATF tests based on the tap(4) tests.
 1.1 29-Sep-2020  roy vether(4): Add ATF tests based on the tap(4) tests.
 1.4 19-Aug-2021  yamaguchi Make the test program run in background after doing BIOCPROMISC

t_vlan has rarely failed by checking IFF_PROMISC before the
test program do BIOCPROMISC. To solve this, BIOCPROMISC is
done in the foreground.

fixes PR/56357
 1.3 09-Jul-2021  yamaguchi added tests for IFF_PROMISC of vlan(4)
 1.2 14-Jun-2018  yamaguchi Add test cases for multicast address handling of vlan(4)

ok ozaki-r@
 1.1 26-Nov-2016  ozaki-r branches: 1.1.2; 1.1.14;
Add basic tests for vlan(4)
 1.1.14.1 25-Jun-2018  pgoyette Sync with HEAD
 1.1.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.1 26-Nov-2016  pgoyette file Makefile was added on branch pgoyette-localcount on 2017-01-07 08:56:56 +0000
 1.2 19-Aug-2021  yamaguchi Make the test program run in background after doing BIOCPROMISC

t_vlan has rarely failed by checking IFF_PROMISC before the
test program do BIOCPROMISC. To solve this, BIOCPROMISC is
done in the foreground.

fixes PR/56357
 1.1 09-Jul-2021  yamaguchi added tests for IFF_PROMISC of vlan(4)
 1.3 19-Aug-2021  yamaguchi Added description of license
 1.2 13-Oct-2019  mrg ifr_name is nul terminated. make it so.
 1.1 14-Jun-2018  yamaguchi branches: 1.1.2; 1.1.4;
Add test cases for multicast address handling of vlan(4)

ok ozaki-r@
 1.1.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.1.2.1 14-Jun-2018  pgoyette file siocXmulti.c was added on branch pgoyette-compat on 2018-06-25 07:26:09 +0000
 1.27 18-Mar-2025  ozaki-r tests, vlan: add a test case for link state sync

A vlan interface synchronizes its link state with its parent's one.
 1.26 18-Mar-2025  ozaki-r tests: dedup t_vlan.sh like others (NFC)
 1.25 02-Nov-2023  yamaguchi branches: 1.25.2;
Added the test for vlan over l2tp
 1.24 19-Aug-2021  yamaguchi branches: 1.24.2;
Make the test program run in background after doing BIOCPROMISC

t_vlan has rarely failed by checking IFF_PROMISC before the
test program do BIOCPROMISC. To solve this, BIOCPROMISC is
done in the foreground.

fixes PR/56357
 1.23 14-Jul-2021  yamaguchi vlan: Added missing $HIJACKING before brconfig
 1.22 14-Jul-2021  yamaguchi Added tests for adding vlan(4) to a bridge and deleting from it

- add vlan(4) that has no parent interface to a bridge member
- the vlan(4) cannot be added to a bridge member
- detach the parent interface of vlan(4) that is in a bridge member
- vlan(4) is deleted from a bridge member at the detaching
 1.21 14-Jul-2021  yamaguchi Added a test about clearing IFF_PROMISC at vlan_unconfig

This test is related to PR/49196
 1.20 09-Jul-2021  yamaguchi added tests for IFF_PROMISC of vlan(4)
 1.19 06-Jul-2021  yamaguchi vlan: added checks of linkstate
 1.18 02-Jul-2021  yamaguchi Added tests for changing a MTU when the vlan(4) is added to bridge(4)

The tests is for PR kern/56292
 1.17 08-Mar-2020  nisimura address to ATF t_vlan failure. adapt ifmcstat(8) output format change.
 1.16 11-Nov-2019  yamaguchi atf: add test cases for MTU that is increased on SIOCSETVLAN

From t-kusaba@IIJ, thanks
 1.15 11-Dec-2018  ozaki-r branches: 1.15.2;
tests: add missing $af
 1.14 07-Dec-2018  ozaki-r tests: check error messages strictly
 1.13 07-Dec-2018  ozaki-r tests: reduce repeated phrases... (NFC)
 1.12 14-Nov-2018  knakahara let ATF detect a bug fixed by if_vla.c:r1.132.
 1.11 14-Jun-2018  yamaguchi branches: 1.11.2;
Update the error message in t_vlan (ENXIO => EINVAL)

ok ozaki-r@
 1.10 14-Jun-2018  yamaguchi Add test cases for multicast address handling of vlan(4)

ok ozaki-r@
 1.9 12-Jun-2018  ozaki-r Add tests of vlan with bridge

The tests trigger a panic reported in PR kern/53357.
 1.8 01-Feb-2018  ozaki-r branches: 1.8.2;
Commonalize and add tests of creating/destroying interfaces
 1.7 23-Nov-2017  kre Since there was already a test to verify that vlan tag 4096 is
detected as invalid, become the "someone" referred to in the
previous commit log, and add tests for 0 and 4095 as well, and
while here, throw in a few more that might elicit bugs.

And if the shell running the tests is able, add tests of a few
random vlan tags between 2 and 4093 (1 and 4094 are always tested)
to check that anything in range works (well, partially check...)
 1.6 23-Nov-2017  kre Don't attempt to test vlan tags 0 or 4095, which are now rejected
as invalid (perhaps someone could add a test to verify that they
continue to be rejected?)
 1.5 16-Nov-2017  msaitoh Add test case of vlan(4)'s re-configure without destroy
(see also if_vlan.c rev. 1.104). Written by s-yamaguchi@iij.
 1.4 11-Oct-2017  msaitoh Add a test case for duplicated VLAN ID.
 1.3 09-Aug-2017  knakahara Add counter check to vlan(4) ATF. Implemented by s-yamaguchi@IIJ, thanks.
 1.2 14-Jun-2017  ozaki-r Add test cases for vlan(4)

From s-yamaguchi@IIJ
 1.1 26-Nov-2016  ozaki-r branches: 1.1.2; 1.1.8;
Add basic tests for vlan(4)
 1.1.8.3 12-Jun-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #876):
sys/net/if_vlan.c: 1.126
tests/net/if_vlan/t_vlan.sh: 1.9
vlan: call ether_ifdetach without IFNET_LOCK
Fix PR kern/53357
--
Add tests of vlan with bridge
The tests trigger a panic reported in PR kern/53357.
 1.1.8.2 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.1.8.1 22-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #380):
tests/net/if_vlan/t_vlan.sh: revision 1.2
tests/net/if_vlan/t_vlan.sh: revision 1.3
tests/net/if_vlan/t_vlan.sh: revision 1.4
tests/net/if_vlan/t_vlan.sh: revision 1.5
Add test cases for vlan(4)
From s-yamaguchi@IIJ
Add counter check to vlan(4) ATF. Implemented by s-yamaguchi@IIJ, thanks.
Add a test case for duplicated VLAN ID.
Add test case of vlan(4)'s re-configure without destroy
(see also if_vlan.c rev. 1.104). Written by s-yamaguchi@iij.
 1.1.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.1 26-Nov-2016  pgoyette file t_vlan.sh was added on branch pgoyette-localcount on 2017-01-07 08:56:56 +0000
 1.8.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.8.2.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.8.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.11.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.11.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.11.2.1 10-Jun-2019  christos Sync with HEAD
 1.15.2.1 13-Nov-2019  martin Pull up following revision(s) (requested by yamaguchi in ticket #420):

sys/net/if_vlan.c: revision 1.148
tests/net/if_vlan/t_vlan.sh: revision 1.16

Fix a bug that vlan(4) fragments IPv6 packets
even the MTU > packet length.

The bug is appeared when the mtu is increased on SIOCSETVLAN.
From t-kusaba@IIJ

atf: add test cases for MTU that is increased on SIOCSETVLAN
From t-kusaba@IIJ, thanks
 1.24.2.2 29-Mar-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1075):

tests/net/if_vlan/t_vlan.sh: revision 1.26
tests/net/if_vlan/t_vlan.sh: revision 1.27

tests: dedup t_vlan.sh like others (NFC)

tests, vlan: add a test case for link state sync
A vlan interface synchronizes its link state with its parent's one.
 1.24.2.1 03-Nov-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #455):
sys/dev/pci/ixgbe/ixgbe.c: revision 1.347
sys/net/if_l2tp.c: revision 1.49
tests/net/if_vlan/t_vlan.sh: revision 1.25
sys/net/if_vlan.c: revision 1.171
sys/net/if_ethersubr.c: revision 1.326
sys/dev/pci/ixgbe/ixv.c: revision 1.194
Use ether_bpf_mtap only when the device supports vlan harware tagging
The function is bpf_mtap() for ethernet devices and *currently*
it is just handling VLAN tag stripped by the hardware.
l2tp(4): use ether_ifattach() to initialize ethercom
Support vlan(4) over l2tp(4)
Added the test for vlan over l2tp
 1.25.2.1 02-Aug-2025  perseant Sync with HEAD
 1.1 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.1 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.6 08-Oct-2024  riastradh wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.

PR kern/58688: userland panic of kernel via wg(4)
 1.5 08-Oct-2024  riastradh wg(4): Test truncated UDP input from the network.

This triggers double-free in the IPv6 udp6_input path -- but,
confusingly, not the IPv4 udp_input path, even though the overudp_cb
interface ought to be the same:

/* udp_input -- no further use of m if return is -1 */
if ((n = udp4_realinput(&src, &dst, &m, iphlen)) == -1) {
UDP_STATINC(UDP_STAT_HDROPS);
return;
}

/* udp6_input -- m_freem if return is not 0 */
if (udp6_realinput(AF_INET6, &src, &dst, &m, off) == 0) {
...
}

bad:
m_freem(m);
return IPPROTO_DONE;

The subroutines udp4_realinput and udp6_realinput pass through the
return value of overudp_cb in essentially the same way:

/* udp4_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sintosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;

/* udp6_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sin6tosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;

PR kern/58688: userland panic of kernel via wg(4)
 1.4 02-Mar-2021  simonb branches: 1.4.6; 1.4.8;
Revert previous, 11th time failed after commit. Problem must be elsewhere.
 1.3 02-Mar-2021  simonb Bump tentative flag timeout (-w) from 10 to 20 seconds. Changes this
test from failing roughly half the time to working 100% over 10 tests on
evbmips er4.
 1.2 16-Oct-2020  roy wg: Fix tests by sprinkling ifconfig -w 10

So protocols have time to finish setup.
 1.1 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.4.8.1 02-Aug-2025  perseant Sync with HEAD
 1.4.6.1 09-Oct-2024  martin Pull up following revision(s) (requested by riastradh in ticket #934):

sys/net/if_wg.c: revision 1.117
sys/net/if_wg.c: revision 1.118
sys/net/if_wg.c: revision 1.119
sys/net/if_wg.c: revision 1.80
sys/net/if_wg.c: revision 1.81
tests/net/if_wg/t_misc.sh: revision 1.13
sys/net/if_wg.c: revision 1.82
sys/net/if_wg.c: revision 1.130
tests/net/if_wg/t_misc.sh: revision 1.14
sys/net/if_wg.c: revision 1.83
sys/net/if_wg.c: revision 1.131
tests/net/if_wg/t_misc.sh: revision 1.15
sys/net/if_wg.c: revision 1.84
sys/net/if_wg.c: revision 1.132
tests/net/if_wg/t_misc.sh: revision 1.16
sys/net/if_wg.c: revision 1.85
sys/net/if_wg.c: revision 1.86
tests/net/if_wg/t_basic.sh: revision 1.5
sys/net/if_wg.c: revision 1.87
tests/net/if_wg/t_basic.sh: revision 1.6
sys/net/if_wg.c: revision 1.88
sys/net/if_wg.c: revision 1.89
sys/net/if_wg.c: revision 1.100
sys/net/if_wg.c: revision 1.101
sys/net/if_wg.c: revision 1.102
sys/net/if_wg.c: revision 1.103
sys/net/if_wg.c: revision 1.104
sys/net/if_wg.c: revision 1.105
sys/net/if_wg.c: revision 1.106
sys/net/if_wg.c: revision 1.107
sys/net/if_wg.c: revision 1.108
sys/net/if_wg.c: revision 1.109
sys/net/if_wg.c: revision 1.120
sys/net/if_wg.c: revision 1.121
sys/net/if_wg.c: revision 1.122
sys/net/if_wg.c: revision 1.123
sys/net/if_wg.c: revision 1.124
sys/net/if_wg.c: revision 1.75
sys/net/if_wg.c: revision 1.77
sys/net/if_wg.c: revision 1.125
sys/net/if_wg.c: revision 1.126
sys/net/if_wg.c: revision 1.79
sys/net/if_wg.c: revision 1.127
sys/net/if_wg.c: revision 1.128
sys/net/if_wg.c: revision 1.129
sys/net/if_wg.c: revision 1.90
sys/net/if_wg.c: revision 1.91
sys/net/if_wg.c: revision 1.92
sys/net/if_wg.c: revision 1.93
sys/net/if_wg.c: revision 1.94
sys/net/if_wg.c: revision 1.95
sys/net/if_wg.c: revision 1.96
sys/net/if_wg.c: revision 1.97
sys/net/if_wg.c: revision 1.98
sys/net/if_wg.c: revision 1.99
sys/net/if_wg.c: revision 1.110
sys/net/if_wg.c: revision 1.111
sys/net/if_wg.c: revision 1.112
sys/net/if_wg.c: revision 1.113
sys/net/if_wg.c: revision 1.114
sys/net/if_wg.c: revision 1.115
sys/net/if_wg.c: revision 1.116

fix simple mis-matched function prototype and definitions.
most of these are like, eg
void foo(int[2]);
with either of these
void foo(int*) { ... }
void foo(int[]) { ... }
in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.
found by GCC 12.

sys: Drop redundant NULL check before m_freem(9)
m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c
Compile-tested on amd64/ALL.
Suggested by knakahara@

Add a wg_debug variable to split between debug/trace/dump messages

Add more debugging in packet validation

If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a
stray rump Makefile...) then we now must have WG_DEBUG also defined, so
if it wasn't, make it so.

While the previous change fixed the broken build, it wasn't the best
way, as defining any of the WG_DEBUG_XXX symbols then effectively
defined all of them - making them as seperate entities, pointless.

So, rearrange the way things are done a little to avoid doing that.

Add packet dump debugging
fix size limit calculation in dump and NULL checks
use hexdump...

Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu
to print size_t values.

There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs
WG_DEBUG defined as well, if set.

Make the debug (WG_DEBUG) func gethexdump() always return a valid
pointer, never NULL, so it doesn't need to be tested before being
printed, which was being done sometimes, but not always.

Add more debugging from Taylor

wg(4): Allow modunload before any interface creation.

The workqueue and pktq are both lazily created, for annoying module
initialization order reasons, so they may not have been created by
the time of modunload.
PR kern/58470

Limit the size of the packet, and print ... if it is bigger. (from kre@)
wg(4): Rework some details of internal session state machine.

This way:
- There is a clear transition between when a session is being set up,
and when it is exposed to the data rx path (wg_handle_msg_data):
atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
ESTABLISHED.
(The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
data rx path, so that's just atomic_store_relaxed. Similarly the
transition to DESTROYING.)
- There is a clear transition between when a session is being set up,
and when it is exposed to the data tx path (wg_output):
atomic_store_release to set wgp->wgp_session_stable to it.
- Every path that reinitializes a session must go through
wg_destroy_session via wg_put_index_session first. This avoids
races between session reuse and the data rx/tx paths.
- Add a log message at the time of every state transition.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix logic to ensure session initiation is underway.

Previously, wg_task_send_init_message would call
wg_send_handshake_msg_init if either:
(a) the stable session is UNKNOWN, meaning a session has not yet been
established, either by us or by the peer (but it could be in
progress); or
(b) the stable session is not UNKNOWN but the unstable session is
_not_ INIT_ACTIVE, meaning there is an established session and we
are not currently initiating a new session.

If wg_output (or wgintr) found no established session while there was
already a session being initiated, we may only enter
wg_task_send_init_message after the session is already established,
and trigger spurious reinitiation.

Instead, create a separate flag to indicate whether it is mandatory
to rekey because limits have passed. Then create a session only if:
(a) the stable session is not ESTABLISHED, or
(b) the mandatory rekey flag is not set,
and clear the mandatory rekey flag.

While here, arrange to do rekey-after-time on tx, not on callout. If
there's no data to tx, we shouldn't reinitiate a session -- we should
stay quiet on the network.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

PR kern/56252: wg(4) state machine has race conditions

PR kern/58463: if_wg does not work when idle.

wg(4): Use callout_halt, not callout_stop.
It's possible that callout_stop might work here, but let's simplify
reasoning about it -- the timers in question only take the peer intr
lock, so it's safe to wait for them while holding the peer lock in
the handshake worker thread.

We may have to undo the task bit but that will take a bit more
analysis to determine.
Prompted by (but probably won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless pserialize_perform on transition to DESTROYING.

A session can still be used when it is in the DESTROYING state, so
there's no need to wait for users to drain here -- that's the whole
point of a separate DESTROYING state.

It is only the transition from DESTROYING back to UNKNOWN, after the
session has been unpublished so no new users can begin, that requires
waiting for all users to drain, and we already do that in
wg_destroy_session.

Prompted by (but won't fix anything in, because this is just a
performance optimization):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Expand cookie secret to 32 bytes.
This is only relevant for denial of service mitigation, so it's not
that big a deal, and the spec doesn't say anything about the size,
but let's make it the standard key size.

PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not
32-byte cookie secret

wg(4): Mark wgp_pending volatile to reflect its usage.
Prompted by (but won't fix any part of):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix session destruction.
Schedule destruction as soon as the session is created, to ensure key
erasure within 2*reject-after-time seconds. Previously, we would
schedule destruction of the previous session 1 second after the next
one has been established. Combined with a failure to update the
state machine on keepalive packets, this led to temporary deadlock
scenarios.

To keep it simple, there's just one callout which runs every
reject-after-time seconds and erases keys in sessions older than
reject-after-time, so if a session is established the moment after it
runs, the keys might not be erased until (2-eps)*reject-after-time
seconds.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Reject rx on sessions older than reject-after-time sec.
Prompted by (but won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): On rx of valid ciphertext, make sure to update state machine.

Previously, we also required the plaintext to be a plausible-looking
IP packet before updating the state machine.

But keepalive packets are empty -- and if the peer initiated the
session to rekey after last tx but had no more data to tx, it will
send a keepalive to finish session initiation.
If we didn't update the state machine in that case, we would stay in
INIT_PASSIVE state unable to tx on the session, which would make
things hang.

So make sure to always update the state machine once we have accepted
a packet as genuine, even if it's genuine garbage on the inside.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make sure to update endpoint on keepalive packets too.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout.

Most of the timers in wg(4) have only 1sec resolution, which might be
rounded in either direction, so make sure there's a 2sec buffer on
either side of the event we care about (the point at which wg(4)
decides to stop retrying handshake).

Won't fix any bugs, but might make the tests slightly less flaky.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions

tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages.

Helpful for following the test log when things go wrong.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
wg(4): Tests should pass now.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Use 32-bit for times handled in rx/tx paths.

The rx and tx paths require unlocked access to wgs_time_established
(to decide whether it's time to rekey) and wgs_time_last_data_sent
(to decide whether we need to reply to incoming data with a keepalive
packet), so do it with atomic_load/store_*.

On 32-bit platforms, we may not be able to do that on time_t.

However, since sessions only last for a few minutes before
reject-after-time kicks in and they are erased, 32 bits is plenty to
record the durations that we need to record here, so this shouldn't
introduce any new bugs even on hosts that exceed 136 years of uptime.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make time_uptime32 work in netbsd<=10.

This is the low 32 bits of time_uptime.
Will simplify pullups to 10 for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix quotation in comment.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Process all altq'd packets when deleting peer.

Can't just drop them because we can only go through all packets on an
interface at a time, for all peers -- so we'd either have to drop all
peers' packets, or requeue the packets for other peers. Probably not
worth the trouble, so let's just wait for all the packets currently
queued up to go through first.

This requires reordering teardown so that we wg_destroy_all_peers,
and thus wg_purge_pending_packets, _before_ we wg_if_detach, because
wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses.

PR kern/58477: experimental wg(4) ALTQ support is probably buggy

wg(4): Tidy up error branches.
No functional change intended, except to add some log messages in
failure cases.
Cleanup after:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Be more consistent about #ifdef INET/INET6.
PR kern/58478: experimental wg(4) probably doesn't build with
INET6-only

wg(4): Parenthesize macro expansions properly.

PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Delete temporary hacks to dump keys and packets.
No longer useful for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Explain why gethexdump/puthexdump is there, and tidy.
This way I will not be tempted to replace it by in-line calls to
libkern hexdump.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Put force_rekey state in the session, not the peer.
That way, there is a time when one thread has exclusive access to the
state, in wg_destroy_session under the peer lock, when we can clear
the state without racing against the data tx path.
This will work more reliably than the atomic_swap_uint I used before.
Noted by kre@.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle static on fixed-size array parameters.

Let's make the static size declarations useful.
No functional change intended.

wg(4): Queue pending packet in FIFO order, not LIFO order.

Sometimes the session takes a seconds to establish, for whatever
reason. It is better if the pending packet, which we queue up to
send as soon as we get the responder's handshake response, is the
most recent packet, rather than the first packet.

That way, we don't wind up with a weird multi-second-delayed ping,
followed by a bunch of dropped, followed by normal ping timings, or
wind up sending the first TCP SYN instead of the most recent, or what
have you. Senders need to be prepared to retransmit anyway if
packets are dropped.

PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
first handshake
wg(4): Sprinkle comments into wg_swap_sessions.
No functional change intended.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): No need for atomic access to wgs_time_established in tx/rx.

This is stable while the session is visible to the tx/rx paths -- it
is initialized before the session is exposed to tx/rx, and doesn't
change until the session is no longer used by any tx/rx path and has
been recycled.

When I sprinkled atomic access to wgs_time_established in if_wg.c
rev. 1.104, it was a vestige of an uncommitted draft that did the
transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
an attempt to enable prompter tx on the new session as soon as it is
established. This turned out to be unnecessary, so I reverted most
of it, but forgot that wgs_time_established no longer needed atomic
treatment.

We could go back to using time_t and time_uptime, now that there's no
need to do atomic loads and stores on these quantities. But there's
no point in 64-bit arithmetic when the time differences are all
guaranteed bounded by a few minutes, so keeping it 32-bit is probably
a slight performance improvement on 32-bit systems.
(In contrast, wgs_time_last_data_sent is both written and read in the
tx path, which may run in parallel on multiple CPUs, so it still
requires the atomic treatment.)
Tidying up for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix memory ordering in detach.
PR kern/58510: experimental wg(4) lacks memory ordering between
wg_count_dec and module unload

wg(4): Fix typo in comment recently added.
Comment added in the service of:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless atomic_load.
wgs_local_index is only ever written to while only one thread has
access to it and it is not in the thmap -- before it is published in
wg_get_session_index, and after it is unpublished in
wg_destroy_session. So no need for atomic_load -- it is stable if we
observe it in thmap_get result.
(Of course this is only for an assertion, which if tripped obviously
indicates a violation of our assumptions. But if that happens, well,
in the worst case we'll see a weird assertion message claiming that
the index is not equal to itself, which from which we can conclude
there must have been a concurrent update, which is good enough to
help diagnose that problem without any atomic_load.)

Tidying some of the changes for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle comments on internal sliding window API.
Post-fix tidying for:
PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Deduplicate session establishment actions.
The actions to
(a) record the last handshake time,
(b) clear some handshake state,
(c) transmit first data if queued, or (if initiator) keepalive, and
(d) begin destroying the old session,
were formerly duplicated between wg_handle_msg_resp (for when we're
the initiator) and wg_task_establish_session (for when we're the
responder).

Instead, let's factor this out into wg_swap_session so there's only
one copy of the logic.
This requires moving wg_update_endpoint_if_necessary a little earlier
in wg_handle_msg_resp -- which should be done anyway so that the
endpoint is updated _before_ the session is published for the data tx
path to use.

Other than moving wg_update_endpoint_if_necessary a little earlier,
no functional change intended.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Read wgs_state atomically in wg_get_stable_session.
As noted in the comment above, it may concurrently transition from
ESTABLISHED to DESTROYING.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Force rekey on tx if session is older than reject-after-time.
One more corner case for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Add missing barriers around wgp_pending access.
PR kern/58520: experimental wg(4) lacks barriers around access to
packet pending initiation
wg(4): Trigger session initiation in wgintr, not in wg_output.

We have to look up the session in wgintr anyway, for
wg_send_data_msg. By triggering session initiation in wgintr instead
of wg_output, we can skip the stable session lookup and reference in
wg_output -- simpler that way.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Queue packet for post-handshake retransmit if limits are hit.
PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
wg(4): When a session is established, send first packet directly.

Like we would do with the keepalive packet, if we had to send that
instead -- no need to defer it to the pktq. Keep it simple.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle volatile on variables requiring atomic access.
No functional change intended, since the relevant access is always
done with atomic_* when it might race with concurrent access -- and
really this should be _Atomic or something. But for now our
atomic_ops(9) API is still spelled with volatile, so we'll use that.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make a rule for who wins when both peers send INIT at once.
The rule is that the peer with the numerically smaller public key
hash, in little-endian, takes priority iff the low order bit of
H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)
is 0, and the peer with the lexicographically larger public key takes
priority iff the low-order bit is 1.

Another case of:
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

This one is, as far as I can tell, simply a deadlock in the protocol
of the whitepaper -- until both sides give up on the handshake and
one of them (but not both) later decides to try sending data again.
(But not related to our t_misc:wg_rekey test, as far as I can tell,
and I haven't put enough thought into how to reliably trigger this
race to write a new automatic test for it.)
wg(4): Add Internet Archive links for the versions cited.
No functional change.

tests/net/if_wg/t_misc: Add some diagnostics.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

wg(4): Test truncated UDP input from the network.
This triggers double-free in the IPv6 udp6_input path -- but,
confusingly, not the IPv4 udp_input path, even though the overudp_cb
interface ought to be the same:
/* udp_input -- no further use of m if return is -1 */
if ((n = udp4_realinput(&src, &dst, &m, iphlen)) == -1) {
UDP_STATINC(UDP_STAT_HDROPS);
return;
}
/* udp6_input -- m_freem if return is not 0 */
if (udp6_realinput(AF_INET6, &src, &dst, &m, off) == 0) {
...
}
bad:
m_freem(m);
return IPPROTO_DONE;

The subroutines udp4_realinput and udp6_realinput pass through the
return value of overudp_cb in essentially the same way:
/* udp4_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sintosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;
/* udp6_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sin6tosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;

PR kern/58688: userland panic of kernel via wg(4)

wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.
PR kern/58688: userland panic of kernel via wg(4)
 1.1 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.16 26-Aug-2024  riastradh tests/net/if_wg/t_misc: Add some diagnostics.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 1.15 28-Jul-2024  riastradh wg(4): Tests should pass now.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.14 28-Jul-2024  riastradh tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages.

Helpful for following the test log when things go wrong.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.13 28-Jul-2024  riastradh tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout.

Most of the timers in wg(4) have only 1sec resolution, which might be
rounded in either direction, so make sure there's a 2sec buffer on
either side of the event we care about (the point at which wg(4)
decides to stop retrying handshake).

Won't fix any bugs, but might make the tests slightly less flaky.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
 1.12 13-Jun-2022  martin branches: 1.12.2; 1.12.4;
PR 56879: bump timeout for the wg_malformed test to 100 seconds,
as suggested by Tom Lane - the test takes ~32 seconds on my unloaded sh4
machine.
 1.11 26-Nov-2021  gson The wg_handshake_timeout test case was failing because it contained
atf_fail "failed to trigger PR kern/56252" without a corresponding
atf_expect_fail "PR kern/56252", which makes no sense. Since the
test case does occasionally fail on real hardware, fix this by adding
the atf_expect_fail rather than by removing the atf_fail.
 1.10 17-Jun-2021  riastradh tests/net/if_wg: Only expect this to fail once!

Not sure how that happened, weird artefact of applying fuzzy patch
twice or something.
 1.9 17-Jun-2021  riastradh tests/net/if_wg: Mark as flaky (PR kern/56252).
 1.8 16-Jun-2021  riastradh tests/net/if_wg: Fix typo: $ifconfig, not $ifconfg.
 1.7 05-Nov-2020  martin Fix typo
 1.6 16-Oct-2020  roy wg: Fix tests by sprinkling ifconfig -w 10

So protocols have time to finish setup.
 1.5 31-Aug-2020  riastradh tests/net/if_wg: Allow one second of leeway for rekey.
 1.4 29-Aug-2020  tih Update the if_wg tests for the human readable 'latest-handshake'
output of wgconfig.
 1.3 27-Aug-2020  riastradh wg: Check mbuf chain length before m_copydata.
 1.2 27-Aug-2020  riastradh Use wgconfig as intended to show diagnostics, not a usage message.
 1.1 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.12.4.1 02-Aug-2025  perseant Sync with HEAD
 1.12.2.1 09-Oct-2024  martin Pull up following revision(s) (requested by riastradh in ticket #934):

sys/net/if_wg.c: revision 1.117
sys/net/if_wg.c: revision 1.118
sys/net/if_wg.c: revision 1.119
sys/net/if_wg.c: revision 1.80
sys/net/if_wg.c: revision 1.81
tests/net/if_wg/t_misc.sh: revision 1.13
sys/net/if_wg.c: revision 1.82
sys/net/if_wg.c: revision 1.130
tests/net/if_wg/t_misc.sh: revision 1.14
sys/net/if_wg.c: revision 1.83
sys/net/if_wg.c: revision 1.131
tests/net/if_wg/t_misc.sh: revision 1.15
sys/net/if_wg.c: revision 1.84
sys/net/if_wg.c: revision 1.132
tests/net/if_wg/t_misc.sh: revision 1.16
sys/net/if_wg.c: revision 1.85
sys/net/if_wg.c: revision 1.86
tests/net/if_wg/t_basic.sh: revision 1.5
sys/net/if_wg.c: revision 1.87
tests/net/if_wg/t_basic.sh: revision 1.6
sys/net/if_wg.c: revision 1.88
sys/net/if_wg.c: revision 1.89
sys/net/if_wg.c: revision 1.100
sys/net/if_wg.c: revision 1.101
sys/net/if_wg.c: revision 1.102
sys/net/if_wg.c: revision 1.103
sys/net/if_wg.c: revision 1.104
sys/net/if_wg.c: revision 1.105
sys/net/if_wg.c: revision 1.106
sys/net/if_wg.c: revision 1.107
sys/net/if_wg.c: revision 1.108
sys/net/if_wg.c: revision 1.109
sys/net/if_wg.c: revision 1.120
sys/net/if_wg.c: revision 1.121
sys/net/if_wg.c: revision 1.122
sys/net/if_wg.c: revision 1.123
sys/net/if_wg.c: revision 1.124
sys/net/if_wg.c: revision 1.75
sys/net/if_wg.c: revision 1.77
sys/net/if_wg.c: revision 1.125
sys/net/if_wg.c: revision 1.126
sys/net/if_wg.c: revision 1.79
sys/net/if_wg.c: revision 1.127
sys/net/if_wg.c: revision 1.128
sys/net/if_wg.c: revision 1.129
sys/net/if_wg.c: revision 1.90
sys/net/if_wg.c: revision 1.91
sys/net/if_wg.c: revision 1.92
sys/net/if_wg.c: revision 1.93
sys/net/if_wg.c: revision 1.94
sys/net/if_wg.c: revision 1.95
sys/net/if_wg.c: revision 1.96
sys/net/if_wg.c: revision 1.97
sys/net/if_wg.c: revision 1.98
sys/net/if_wg.c: revision 1.99
sys/net/if_wg.c: revision 1.110
sys/net/if_wg.c: revision 1.111
sys/net/if_wg.c: revision 1.112
sys/net/if_wg.c: revision 1.113
sys/net/if_wg.c: revision 1.114
sys/net/if_wg.c: revision 1.115
sys/net/if_wg.c: revision 1.116

fix simple mis-matched function prototype and definitions.
most of these are like, eg
void foo(int[2]);
with either of these
void foo(int*) { ... }
void foo(int[]) { ... }
in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.
found by GCC 12.

sys: Drop redundant NULL check before m_freem(9)
m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c
Compile-tested on amd64/ALL.
Suggested by knakahara@

Add a wg_debug variable to split between debug/trace/dump messages

Add more debugging in packet validation

If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a
stray rump Makefile...) then we now must have WG_DEBUG also defined, so
if it wasn't, make it so.

While the previous change fixed the broken build, it wasn't the best
way, as defining any of the WG_DEBUG_XXX symbols then effectively
defined all of them - making them as seperate entities, pointless.

So, rearrange the way things are done a little to avoid doing that.

Add packet dump debugging
fix size limit calculation in dump and NULL checks
use hexdump...

Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu
to print size_t values.

There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs
WG_DEBUG defined as well, if set.

Make the debug (WG_DEBUG) func gethexdump() always return a valid
pointer, never NULL, so it doesn't need to be tested before being
printed, which was being done sometimes, but not always.

Add more debugging from Taylor

wg(4): Allow modunload before any interface creation.

The workqueue and pktq are both lazily created, for annoying module
initialization order reasons, so they may not have been created by
the time of modunload.
PR kern/58470

Limit the size of the packet, and print ... if it is bigger. (from kre@)
wg(4): Rework some details of internal session state machine.

This way:
- There is a clear transition between when a session is being set up,
and when it is exposed to the data rx path (wg_handle_msg_data):
atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
ESTABLISHED.
(The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
data rx path, so that's just atomic_store_relaxed. Similarly the
transition to DESTROYING.)
- There is a clear transition between when a session is being set up,
and when it is exposed to the data tx path (wg_output):
atomic_store_release to set wgp->wgp_session_stable to it.
- Every path that reinitializes a session must go through
wg_destroy_session via wg_put_index_session first. This avoids
races between session reuse and the data rx/tx paths.
- Add a log message at the time of every state transition.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix logic to ensure session initiation is underway.

Previously, wg_task_send_init_message would call
wg_send_handshake_msg_init if either:
(a) the stable session is UNKNOWN, meaning a session has not yet been
established, either by us or by the peer (but it could be in
progress); or
(b) the stable session is not UNKNOWN but the unstable session is
_not_ INIT_ACTIVE, meaning there is an established session and we
are not currently initiating a new session.

If wg_output (or wgintr) found no established session while there was
already a session being initiated, we may only enter
wg_task_send_init_message after the session is already established,
and trigger spurious reinitiation.

Instead, create a separate flag to indicate whether it is mandatory
to rekey because limits have passed. Then create a session only if:
(a) the stable session is not ESTABLISHED, or
(b) the mandatory rekey flag is not set,
and clear the mandatory rekey flag.

While here, arrange to do rekey-after-time on tx, not on callout. If
there's no data to tx, we shouldn't reinitiate a session -- we should
stay quiet on the network.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

PR kern/56252: wg(4) state machine has race conditions

PR kern/58463: if_wg does not work when idle.

wg(4): Use callout_halt, not callout_stop.
It's possible that callout_stop might work here, but let's simplify
reasoning about it -- the timers in question only take the peer intr
lock, so it's safe to wait for them while holding the peer lock in
the handshake worker thread.

We may have to undo the task bit but that will take a bit more
analysis to determine.
Prompted by (but probably won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless pserialize_perform on transition to DESTROYING.

A session can still be used when it is in the DESTROYING state, so
there's no need to wait for users to drain here -- that's the whole
point of a separate DESTROYING state.

It is only the transition from DESTROYING back to UNKNOWN, after the
session has been unpublished so no new users can begin, that requires
waiting for all users to drain, and we already do that in
wg_destroy_session.

Prompted by (but won't fix anything in, because this is just a
performance optimization):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Expand cookie secret to 32 bytes.
This is only relevant for denial of service mitigation, so it's not
that big a deal, and the spec doesn't say anything about the size,
but let's make it the standard key size.

PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not
32-byte cookie secret

wg(4): Mark wgp_pending volatile to reflect its usage.
Prompted by (but won't fix any part of):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix session destruction.
Schedule destruction as soon as the session is created, to ensure key
erasure within 2*reject-after-time seconds. Previously, we would
schedule destruction of the previous session 1 second after the next
one has been established. Combined with a failure to update the
state machine on keepalive packets, this led to temporary deadlock
scenarios.

To keep it simple, there's just one callout which runs every
reject-after-time seconds and erases keys in sessions older than
reject-after-time, so if a session is established the moment after it
runs, the keys might not be erased until (2-eps)*reject-after-time
seconds.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Reject rx on sessions older than reject-after-time sec.
Prompted by (but won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): On rx of valid ciphertext, make sure to update state machine.

Previously, we also required the plaintext to be a plausible-looking
IP packet before updating the state machine.

But keepalive packets are empty -- and if the peer initiated the
session to rekey after last tx but had no more data to tx, it will
send a keepalive to finish session initiation.
If we didn't update the state machine in that case, we would stay in
INIT_PASSIVE state unable to tx on the session, which would make
things hang.

So make sure to always update the state machine once we have accepted
a packet as genuine, even if it's genuine garbage on the inside.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make sure to update endpoint on keepalive packets too.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout.

Most of the timers in wg(4) have only 1sec resolution, which might be
rounded in either direction, so make sure there's a 2sec buffer on
either side of the event we care about (the point at which wg(4)
decides to stop retrying handshake).

Won't fix any bugs, but might make the tests slightly less flaky.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions

tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages.

Helpful for following the test log when things go wrong.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
wg(4): Tests should pass now.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Use 32-bit for times handled in rx/tx paths.

The rx and tx paths require unlocked access to wgs_time_established
(to decide whether it's time to rekey) and wgs_time_last_data_sent
(to decide whether we need to reply to incoming data with a keepalive
packet), so do it with atomic_load/store_*.

On 32-bit platforms, we may not be able to do that on time_t.

However, since sessions only last for a few minutes before
reject-after-time kicks in and they are erased, 32 bits is plenty to
record the durations that we need to record here, so this shouldn't
introduce any new bugs even on hosts that exceed 136 years of uptime.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make time_uptime32 work in netbsd<=10.

This is the low 32 bits of time_uptime.
Will simplify pullups to 10 for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix quotation in comment.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Process all altq'd packets when deleting peer.

Can't just drop them because we can only go through all packets on an
interface at a time, for all peers -- so we'd either have to drop all
peers' packets, or requeue the packets for other peers. Probably not
worth the trouble, so let's just wait for all the packets currently
queued up to go through first.

This requires reordering teardown so that we wg_destroy_all_peers,
and thus wg_purge_pending_packets, _before_ we wg_if_detach, because
wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses.

PR kern/58477: experimental wg(4) ALTQ support is probably buggy

wg(4): Tidy up error branches.
No functional change intended, except to add some log messages in
failure cases.
Cleanup after:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Be more consistent about #ifdef INET/INET6.
PR kern/58478: experimental wg(4) probably doesn't build with
INET6-only

wg(4): Parenthesize macro expansions properly.

PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Delete temporary hacks to dump keys and packets.
No longer useful for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Explain why gethexdump/puthexdump is there, and tidy.
This way I will not be tempted to replace it by in-line calls to
libkern hexdump.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Put force_rekey state in the session, not the peer.
That way, there is a time when one thread has exclusive access to the
state, in wg_destroy_session under the peer lock, when we can clear
the state without racing against the data tx path.
This will work more reliably than the atomic_swap_uint I used before.
Noted by kre@.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle static on fixed-size array parameters.

Let's make the static size declarations useful.
No functional change intended.

wg(4): Queue pending packet in FIFO order, not LIFO order.

Sometimes the session takes a seconds to establish, for whatever
reason. It is better if the pending packet, which we queue up to
send as soon as we get the responder's handshake response, is the
most recent packet, rather than the first packet.

That way, we don't wind up with a weird multi-second-delayed ping,
followed by a bunch of dropped, followed by normal ping timings, or
wind up sending the first TCP SYN instead of the most recent, or what
have you. Senders need to be prepared to retransmit anyway if
packets are dropped.

PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
first handshake
wg(4): Sprinkle comments into wg_swap_sessions.
No functional change intended.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): No need for atomic access to wgs_time_established in tx/rx.

This is stable while the session is visible to the tx/rx paths -- it
is initialized before the session is exposed to tx/rx, and doesn't
change until the session is no longer used by any tx/rx path and has
been recycled.

When I sprinkled atomic access to wgs_time_established in if_wg.c
rev. 1.104, it was a vestige of an uncommitted draft that did the
transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
an attempt to enable prompter tx on the new session as soon as it is
established. This turned out to be unnecessary, so I reverted most
of it, but forgot that wgs_time_established no longer needed atomic
treatment.

We could go back to using time_t and time_uptime, now that there's no
need to do atomic loads and stores on these quantities. But there's
no point in 64-bit arithmetic when the time differences are all
guaranteed bounded by a few minutes, so keeping it 32-bit is probably
a slight performance improvement on 32-bit systems.
(In contrast, wgs_time_last_data_sent is both written and read in the
tx path, which may run in parallel on multiple CPUs, so it still
requires the atomic treatment.)
Tidying up for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix memory ordering in detach.
PR kern/58510: experimental wg(4) lacks memory ordering between
wg_count_dec and module unload

wg(4): Fix typo in comment recently added.
Comment added in the service of:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless atomic_load.
wgs_local_index is only ever written to while only one thread has
access to it and it is not in the thmap -- before it is published in
wg_get_session_index, and after it is unpublished in
wg_destroy_session. So no need for atomic_load -- it is stable if we
observe it in thmap_get result.
(Of course this is only for an assertion, which if tripped obviously
indicates a violation of our assumptions. But if that happens, well,
in the worst case we'll see a weird assertion message claiming that
the index is not equal to itself, which from which we can conclude
there must have been a concurrent update, which is good enough to
help diagnose that problem without any atomic_load.)

Tidying some of the changes for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle comments on internal sliding window API.
Post-fix tidying for:
PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Deduplicate session establishment actions.
The actions to
(a) record the last handshake time,
(b) clear some handshake state,
(c) transmit first data if queued, or (if initiator) keepalive, and
(d) begin destroying the old session,
were formerly duplicated between wg_handle_msg_resp (for when we're
the initiator) and wg_task_establish_session (for when we're the
responder).

Instead, let's factor this out into wg_swap_session so there's only
one copy of the logic.
This requires moving wg_update_endpoint_if_necessary a little earlier
in wg_handle_msg_resp -- which should be done anyway so that the
endpoint is updated _before_ the session is published for the data tx
path to use.

Other than moving wg_update_endpoint_if_necessary a little earlier,
no functional change intended.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Read wgs_state atomically in wg_get_stable_session.
As noted in the comment above, it may concurrently transition from
ESTABLISHED to DESTROYING.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Force rekey on tx if session is older than reject-after-time.
One more corner case for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Add missing barriers around wgp_pending access.
PR kern/58520: experimental wg(4) lacks barriers around access to
packet pending initiation
wg(4): Trigger session initiation in wgintr, not in wg_output.

We have to look up the session in wgintr anyway, for
wg_send_data_msg. By triggering session initiation in wgintr instead
of wg_output, we can skip the stable session lookup and reference in
wg_output -- simpler that way.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Queue packet for post-handshake retransmit if limits are hit.
PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
wg(4): When a session is established, send first packet directly.

Like we would do with the keepalive packet, if we had to send that
instead -- no need to defer it to the pktq. Keep it simple.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle volatile on variables requiring atomic access.
No functional change intended, since the relevant access is always
done with atomic_* when it might race with concurrent access -- and
really this should be _Atomic or something. But for now our
atomic_ops(9) API is still spelled with volatile, so we'll use that.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make a rule for who wins when both peers send INIT at once.
The rule is that the peer with the numerically smaller public key
hash, in little-endian, takes priority iff the low order bit of
H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)
is 0, and the peer with the lexicographically larger public key takes
priority iff the low-order bit is 1.

Another case of:
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

This one is, as far as I can tell, simply a deadlock in the protocol
of the whitepaper -- until both sides give up on the handshake and
one of them (but not both) later decides to try sending data again.
(But not related to our t_misc:wg_rekey test, as far as I can tell,
and I haven't put enough thought into how to reliably trigger this
race to write a new automatic test for it.)
wg(4): Add Internet Archive links for the versions cited.
No functional change.

tests/net/if_wg/t_misc: Add some diagnostics.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

wg(4): Test truncated UDP input from the network.
This triggers double-free in the IPv6 udp6_input path -- but,
confusingly, not the IPv4 udp_input path, even though the overudp_cb
interface ought to be the same:
/* udp_input -- no further use of m if return is -1 */
if ((n = udp4_realinput(&src, &dst, &m, iphlen)) == -1) {
UDP_STATINC(UDP_STAT_HDROPS);
return;
}
/* udp6_input -- m_freem if return is not 0 */
if (udp6_realinput(AF_INET6, &src, &dst, &m, off) == 0) {
...
}
bad:
m_freem(m);
return IPPROTO_DONE;

The subroutines udp4_realinput and udp6_realinput pass through the
return value of overudp_cb in essentially the same way:
/* udp4_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sintosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;
/* udp6_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sin6tosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;

PR kern/58688: userland panic of kernel via wg(4)

wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.
PR kern/58688: userland panic of kernel via wg(4)
 1.2 29-Aug-2020  tih Update the if_wg tests for the human readable 'latest-handshake'
output of wgconfig.
 1.1 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.9 06-Jan-2015  christos need immediate assignment
 1.8 06-Jan-2015  christos simplify.
 1.7 06-Jan-2015  christos fix helper program installation
 1.6 06-Jan-2015  christos fix directory
 1.5 06-Jan-2015  christos assym.h moved.
 1.4 05-Jan-2015  christos Form the sources in a better way.
 1.3 05-Jan-2015  christos Too hard to cross-build mkassym.c correctly, use a standard assym.h
 1.2 05-Jan-2015  christos add a dummy mkassym
look in more places for cpu_in_cksum
 1.1 05-Jan-2015  christos Port the in_cksum test from regress.
 1.1 05-Jan-2015  christos Too hard to cross-build mkassym.c correctly, use a standard assym.h
 1.5 18-Oct-2015  christos remove debugging test
 1.4 18-Oct-2015  christos add code to dump the mbuf contents
 1.3 06-Jan-2015  joerg panic is printflike.
 1.2 05-Jan-2015  christos add a dummy mkassym
look in more places for cpu_in_cksum
 1.1 05-Jan-2015  christos Port the in_cksum test from regress.
 1.2 05-Jan-2015  christos Too hard to cross-build mkassym.c correctly, use a standard assym.h
 1.1 05-Jan-2015  christos add a dummy mkassym
look in more places for cpu_in_cksum
 1.2 06-Jan-2015  martin Invoke the helper program from the test installation directory (instead
of the current/temporary).
Properly register failure.
 1.1 05-Jan-2015  christos Port the in_cksum test from regress.
 1.1 17-Nov-2022  ozaki-r tests: build and install added test files
 1.2 17-Nov-2022  ozaki-r tests: tweak broadcast_bind.c for NetBSD
 1.1 17-Nov-2022  ozaki-r tests: import broadcast_bind.c from OpenBSD

As of $OpenBSD: broadcast_bind.c,v 1.2 2015/12/02 20:45:00 mpi Exp $
 1.2 17-Nov-2022  ozaki-r tests: make inpcb_bind.c buildable
 1.1 17-Nov-2022  ozaki-r tests: import in_pcbbind/runtest.c from OpenBSD as inpcb_bind.c

As of $OpenBSD: runtest.c,v 1.7 2022/04/10 14:08:35 claudio Exp $
 1.1 17-Nov-2022  ozaki-r tests: add t_broadcast_bind.sh
 1.2 05-Aug-2023  riastradh tests/net/inpcb: Tests require root.
 1.1 17-Nov-2022  ozaki-r tests: add t_inpcb_bind.sh
 1.12 09-Nov-2022  knakahara Add test for sys/netipsec/ipsec.c:r1.176.
 1.11 11-Oct-2022  knakahara Add test for sadb_x_policy->sadb_x_policy_flags.
 1.10 30-Oct-2017  ozaki-r Add test cases of NAT-T (transport mode)

A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
 1.9 02-Aug-2017  ozaki-r Add test cases for setsockopt(IP_IPSEC_POLICY)
 1.8 18-Jul-2017  ozaki-r branches: 1.8.2;
Separate test files
 1.7 03-Jul-2017  ozaki-r Add test cases for IPComp
 1.6 15-May-2017  ozaki-r branches: 1.6.2;
Add test cases for SA lifetime
 1.5 10-May-2017  ozaki-r Test tunnel mode with IPv4 over IPv6 and IPv6 over IPv4
 1.4 09-May-2017  ozaki-r Test flushing SAD/SPD entries
 1.3 27-Apr-2017  ozaki-r Add test cases for L2TP/IPsec
 1.2 27-Apr-2017  ozaki-r Add test cases for gif/IPsec
 1.1 14-Apr-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.6;
Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.1.6.3 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.1.6.2 11-May-2017  pgoyette Sync with HEAD
 1.1.6.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.1.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.1.4.1 14-Apr-2017  pgoyette file Makefile was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.1.2.2 14-Apr-2017  ozaki-r 79006
 1.1.2.1 14-Apr-2017  ozaki-r file Makefile was added on branch bouyer-socketcan on 2017-04-14 02:56:50 +0000
 1.6.2.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #357):
distrib/sets/lists/debug/mi: 1.228
distrib/sets/lists/tests/mi: 1.765-1.766
etc/mtree/NetBSD.dist.tests: 1.149
sys/net/npf/npf_ctl.c: 1.49
tests/net/ipsec/Makefile: 1.10
tests/net/ipsec/algorithms.sh: 1.6
tests/net/ipsec/natt_terminator.c: 1.1
tests/net/ipsec/t_ipsec_natt.sh: 1.1
tests/net/net_common.sh: 1.23-1.24
usr.sbin/npf/npfctl/npfctl.c: 1.54
Handle esp-udp for NAT-T
--
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
--
Fix showing translated port (ntohs-ed twice wrongly)
--
Add test cases of NAT-T (transport mode)
A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
--
Add net/ipsec debug lib directory
--
Add ./usr/libdata/debug/usr/tests/net/ipsec
--
Stop using bpfjit
Because most architectures don't support it and npf still works without it.
 1.6.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.8.2.2 18-Jul-2017  ozaki-r 301908
 1.8.2.1 18-Jul-2017  ozaki-r file Makefile was added on branch perseant-stdc-iso10646 on 2017-07-18 02:16:08 +0000
 1.7 05-Dec-2021  msaitoh s/encript/encrypt/ in comment.
 1.6 27-Oct-2017  ozaki-r Handle esp-udp for NAT-T
 1.5 03-Jul-2017  ozaki-r Add test cases for IPComp
 1.4 12-May-2017  ozaki-r branches: 1.4.2;
Dedup some routines
 1.3 27-Apr-2017  ozaki-r Prefer rijndael-cbc
 1.2 27-Apr-2017  ozaki-r Add minimum sets of algorithms for testing
 1.1 14-Apr-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.6;
Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.1.6.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.1.6.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.1.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.1.4.1 14-Apr-2017  pgoyette file algorithms.sh was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.1.2.2 14-Apr-2017  ozaki-r 79006
 1.1.2.1 14-Apr-2017  ozaki-r file algorithms.sh was added on branch bouyer-socketcan on 2017-04-14 02:56:50 +0000
 1.4.2.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #357):
distrib/sets/lists/debug/mi: 1.228
distrib/sets/lists/tests/mi: 1.765-1.766
etc/mtree/NetBSD.dist.tests: 1.149
sys/net/npf/npf_ctl.c: 1.49
tests/net/ipsec/Makefile: 1.10
tests/net/ipsec/algorithms.sh: 1.6
tests/net/ipsec/natt_terminator.c: 1.1
tests/net/ipsec/t_ipsec_natt.sh: 1.1
tests/net/net_common.sh: 1.23-1.24
usr.sbin/npf/npfctl/npfctl.c: 1.54
Handle esp-udp for NAT-T
--
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
--
Fix showing translated port (ntohs-ed twice wrongly)
--
Add test cases of NAT-T (transport mode)
A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
--
Add net/ipsec debug lib directory
--
Add ./usr/libdata/debug/usr/tests/net/ipsec
--
Stop using bpfjit
Because most architectures don't support it and npf still works without it.
 1.4.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.8 05-Jun-2020  knakahara Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc

ok'ed by ozaki-r@n.o, thanks.
 1.7 20-Oct-2017  ozaki-r branches: 1.7.6;
Fix incomplete SP setups
 1.6 08-Aug-2017  ozaki-r Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
 1.5 02-Aug-2017  ozaki-r Add test cases that there are SPs but no relevant SAs
 1.4 03-Jul-2017  ozaki-r Add test cases for IPComp
 1.3 15-May-2017  ozaki-r branches: 1.3.2;
Fix typo
 1.2 10-May-2017  ozaki-r branches: 1.2.2;
Introduce check_sa_entries to remove lots of duplicated codes
 1.1 09-May-2017  ozaki-r Test flushing SAD/SPD entries
 1.2.2.3 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.2.2.2 11-May-2017  pgoyette Sync with HEAD
 1.2.2.1 10-May-2017  pgoyette file common.sh was added on branch prg-localcount2 on 2017-05-11 02:58:42 +0000
 1.3.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.7.6.1 10-Nov-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1129):

tests/net/if_ipsec/t_ipsec_pfil.sh: revision 1.3
tests/net/if_ipsec/t_ipsec.sh: revision 1.11
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/common.sh: revision 1.8

Typo in error message

Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc
ok'ed by ozaki-r@n.o, thanks.

Fix missing "-m tranport" options. Pointed out by k-goda@IIJ.

Using any mode SA causes unepected call path, that is,
ipsec4_common_input_cb() calls ip_input() directly instead of
ipsecif4_input().
 1.2 22-Nov-2018  knakahara Add ATF for IPv6 NAT-T.

We use IPv6 NAT-T to avoid IPsec slowing down caused by dropping ESP packets
by some Customer Premises Equipments (CPE). I implement ATF to test such
situation.

I think it can also work with nat66, but I have not tested to the fine details.
 1.1 30-Oct-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.6;
Add test cases of NAT-T (transport mode)

A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
 1.1.6.1 10-Jun-2019  christos Sync with HEAD
 1.1.4.1 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.1.2.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #357):
distrib/sets/lists/debug/mi: 1.228
distrib/sets/lists/tests/mi: 1.765-1.766
etc/mtree/NetBSD.dist.tests: 1.149
sys/net/npf/npf_ctl.c: 1.49
tests/net/ipsec/Makefile: 1.10
tests/net/ipsec/algorithms.sh: 1.6
tests/net/ipsec/natt_terminator.c: 1.1
tests/net/ipsec/t_ipsec_natt.sh: 1.1
tests/net/net_common.sh: 1.23-1.24
usr.sbin/npf/npfctl/npfctl.c: 1.54
Handle esp-udp for NAT-T
--
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
--
Fix showing translated port (ntohs-ed twice wrongly)
--
Add test cases of NAT-T (transport mode)
A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
--
Add net/ipsec debug lib directory
--
Add ./usr/libdata/debug/usr/tests/net/ipsec
--
Stop using bpfjit
Because most architectures don't support it and npf still works without it.
 1.1.2.1 30-Oct-2017  snj file natt_terminator.c was added on branch netbsd-8 on 2017-11-17 20:43:11 +0000
 1.4 19-Jun-2023  knakahara Repair test coverage. I revert by proxy as the committer seems too busy to even reply mail.

TODO:
Provide some way for small machines to run subset test so that they get
shorter run time at the expense of test coverage.
 1.3 04-Jun-2023  chs The ATF design is O(N^2) in the number of TCs in one TP, which on some
slower platforms causes the net/ipsec tests to take as much as 30% of
the total time to run all of the ATF tests. Reduce the number of TCs
in various net/ipsec TPs by iterating over *_ALGORITHMS_MINIMUM rather
than *_ALGORITHMS. Various of the net/ipsec tests already use the
smaller lists, so change the rest of them to do so as well.
 1.2 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.1 14-Apr-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.8;
Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.1.8.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.1.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.1.4.1 14-Apr-2017  pgoyette file t_ipsec_ah_keys.sh was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.1.2.2 14-Apr-2017  ozaki-r 79006
 1.1.2.1 14-Apr-2017  ozaki-r file t_ipsec_ah_keys.sh was added on branch bouyer-socketcan on 2017-04-14 02:56:50 +0000
 1.4 19-Jun-2023  knakahara Repair test coverage. I revert by proxy as the committer seems too busy to even reply mail.

TODO:
Provide some way for small machines to run subset test so that they get
shorter run time at the expense of test coverage.
 1.3 04-Jun-2023  chs The ATF design is O(N^2) in the number of TCs in one TP, which on some
slower platforms causes the net/ipsec tests to take as much as 30% of
the total time to run all of the ATF tests. Reduce the number of TCs
in various net/ipsec TPs by iterating over *_ALGORITHMS_MINIMUM rather
than *_ALGORITHMS. Various of the net/ipsec tests already use the
smaller lists, so change the rest of them to do so as well.
 1.2 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.1 14-Apr-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.8;
Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.1.8.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.1.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.1.4.1 14-Apr-2017  pgoyette file t_ipsec_esp_keys.sh was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.1.2.2 14-Apr-2017  ozaki-r 79006
 1.1.2.1 14-Apr-2017  ozaki-r file t_ipsec_esp_keys.sh was added on branch bouyer-socketcan on 2017-04-14 02:56:50 +0000
 1.2 24-Nov-2022  knakahara clean up
 1.1 09-Nov-2022  knakahara Add test for sys/netipsec/ipsec.c:r1.176.
 1.10 22-Aug-2023  rin t_ipsec_{gif,l2tp}: Adjust for tcpdump 4.99.4

It does not longer output redundant `` (ipip-proto-4)'':
https://github.com/the-tcpdump-group/tcpdump/commit/cba9b77a98e9dde764abde71a899ee8937ca56e8

Now, these tests become passing again.

Thanks mlelstv@ for finding out upstream commit.
OK ozaki-r@
 1.9 17-Feb-2020  ozaki-r tests: add missing ifconfig -w

This change mitigates PR kern/54897.
 1.8 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.7 03-Aug-2017  ozaki-r branches: 1.7.4;
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.6 14-Jun-2017  ozaki-r Enable DEBUG for babylon5
 1.5 12-May-2017  ozaki-r branches: 1.5.2;
Dedup some routines
 1.4 10-May-2017  ozaki-r Introduce check_sa_entries to remove lots of duplicated codes
 1.3 09-May-2017  ozaki-r Test flushing SAD/SPD entries
 1.2 27-Apr-2017  ozaki-r branches: 1.2.2;
Test transport mode as well as tunnel mode
 1.1 27-Apr-2017  ozaki-r Add test cases for gif/IPsec
 1.2.2.4 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.2.2.3 11-May-2017  pgoyette Sync with HEAD
 1.2.2.2 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.2.2.1 27-Apr-2017  pgoyette file t_ipsec_gif.sh was added on branch prg-localcount2 on 2017-05-02 03:19:23 +0000
 1.5.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.7.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.7.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.10 22-Aug-2023  rin t_ipsec_{gif,l2tp}: Adjust for tcpdump 4.99.4

It does not longer output redundant `` (ipip-proto-4)'':
https://github.com/the-tcpdump-group/tcpdump/commit/cba9b77a98e9dde764abde71a899ee8937ca56e8

Now, these tests become passing again.

Thanks mlelstv@ for finding out upstream commit.
OK ozaki-r@
 1.9 17-Feb-2020  ozaki-r tests: add missing ifconfig -w

This change mitigates PR kern/54897.
 1.8 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.7 03-Aug-2017  ozaki-r branches: 1.7.4;
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.6 14-Jun-2017  ozaki-r Enable DEBUG for babylon5
 1.5 12-May-2017  ozaki-r branches: 1.5.2;
Dedup some routines
 1.4 10-May-2017  ozaki-r Introduce check_sa_entries to remove lots of duplicated codes
 1.3 09-May-2017  ozaki-r Test flushing SAD/SPD entries
 1.2 27-Apr-2017  ozaki-r branches: 1.2.2;
Test transport mode as well as tunnel mode
 1.1 27-Apr-2017  ozaki-r Add test cases for L2TP/IPsec
 1.2.2.4 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.2.2.3 11-May-2017  pgoyette Sync with HEAD
 1.2.2.2 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.2.2.1 27-Apr-2017  pgoyette file t_ipsec_l2tp.sh was added on branch prg-localcount2 on 2017-05-02 03:19:23 +0000
 1.5.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.7.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.7.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25 07-Jan-2022  andvar s/udpate/update/
 1.24 31-Aug-2020  martin Skip timeout tests, pointing to PR 55632.
 1.23 23-Jul-2019  ozaki-r tests: add tests for getspi and udpate
 1.22 09-Nov-2017  ozaki-r branches: 1.22.4;
Dedup some checks

And the change a bit optimizes checks of SA expirations, which
may shorten testing time.
 1.21 09-Nov-2017  ozaki-r "Mark key_timehandler_ch callout as MP-safe" change needs one more sec to make lifetime tests stable
 1.20 20-Oct-2017  ozaki-r Add test cases for one SP with multiple SAs

These are for a bug reported recently which modifies SPs accidentally.
 1.19 20-Oct-2017  ozaki-r Fix incomplete SP setups
 1.18 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.17 02-Aug-2017  ozaki-r Add test cases that there are SPs but no relevant SAs
 1.16 24-Jul-2017  ozaki-r Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
 1.15 21-Jul-2017  ozaki-r Stop setting isr->sav on looking up sav in key_checkrequest
 1.14 20-Jul-2017  ozaki-r Don't make SAs expired on tests that delete SAs explicitly
 1.13 19-Jul-2017  ozaki-r Add tests that explicitly delete SAs instead of waiting for expirations
 1.12 19-Jul-2017  ozaki-r Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
 1.11 18-Jul-2017  ozaki-r branches: 1.11.2;
Separate test files
 1.10 15-Jul-2017  ozaki-r Fix wrong argument handling
 1.9 14-Jul-2017  ozaki-r Add test cases for SAs with different SPIs
 1.8 05-Jul-2017  ozaki-r Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
 1.7 19-Jun-2017  ozaki-r Add test cases of TCP/IPsec on an IPv4-mapped IPv6 address

It reproduces the same panic reported in PR kern/52304
(but not sure that its cause is also same).
 1.6 01-Jun-2017  ozaki-r branches: 1.6.2;
Test TCP communications over IPsec transport mode with ESP or AH

This tests SP caches of PCB.
 1.5 01-Jun-2017  ozaki-r Remove a unused local variable
 1.4 24-May-2017  ozaki-r Enable DEBUG to know what is happening on anita/sparc
 1.3 18-May-2017  ozaki-r branches: 1.3.2;
Don't check the existence of SA entries eagerly

They can be expired at that point if their lifetime is very short.
This may fix unexpected failures of tests running on anita.
 1.2 17-May-2017  ozaki-r Add test cases of TCP communications with IPsec enabled

The test cases transfer data over TCP by using nc with IPsec just enabled
(no SA/SP is configured) and confirm the commit "Fix diagnostic assertion
failure in ipsec_init_policy" really fixes the issue.
 1.1 15-May-2017  ozaki-r Add test cases for SA lifetime
 1.3.2.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.3.2.1 18-May-2017  pgoyette file t_ipsec_misc.sh was added on branch prg-localcount2 on 2017-05-19 00:22:59 +0000
 1.6.2.4 25-Jul-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1306):

crypto/dist/ipsec-tools/src/setkey/parse.y: revision 1.23
sys/netipsec/key.c: revision 1.265
crypto/dist/ipsec-tools/src/setkey/token.l: revision 1.23
tests/net/ipsec/t_ipsec_misc.sh: revision 1.23

ipsec: fix a regression of the update API

The update API updates an SA by creating a new SA and removing an existing SA.
The previous change removed a newly added SA wrongly if an existing SA had been
created by the getspi API.

setkey: enable to use the getspi API

If a specified SPI is not zero, tell the kernel to use the SPI by using
SADB_EXT_SPIRANGE. Otherwise, the kernel picks a random SPI.

It enables to mimic racoon.

tests: add tests for getspi and udpate
 1.6.2.3 21-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #360):
tests/net/ipsec/t_ipsec_misc.sh: revision 1.21
tests/net/ipsec/t_ipsec_misc.sh: revision 1.22
sys/netipsec/key.c: revision 1.235
Mark key_timehandler_ch callout as MP-safe (just forgot to do so)
"Mark key_timehandler_ch callout as MP-safe" change needs one more sec to make lifetime tests stable
Dedup some checks
And the change a bit optimizes checks of SA expirations, which
may shorten testing time.
 1.6.2.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.6.2.1 21-Jun-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #51):
sys/netinet/tcp_input.c: revision 1.358
tests/net/ipsec/t_ipsec_misc.sh: revision 1.7
Fix KASSERT in tcp_input
inp can be NULL when receiving an IPv4 packet on an IPv4-mapped IPv6
address. In that case KASSERT(sotoinpcb(so) == inp) always fails.
Should fix PR kern/52304 (at least it fixes the same panic as the
report)
--
Add test cases of TCP/IPsec on an IPv4-mapped IPv6 address
It reproduces the same panic reported in PR kern/52304
(but not sure that its cause is also same).
 1.11.2.2 18-Jul-2017  ozaki-r 301908
 1.11.2.1 18-Jul-2017  ozaki-r file t_ipsec_misc.sh was added on branch perseant-stdc-iso10646 on 2017-07-18 02:16:08 +0000
 1.22.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.5 05-Jun-2020  knakahara Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc

ok'ed by ozaki-r@n.o, thanks.
 1.4 01-Jun-2020  martin Typo in error message
 1.3 19-Aug-2019  ozaki-r tests: use rump_server_add_iface to create interfaces
 1.2 22-Nov-2018  knakahara branches: 1.2.2;
Add ATF for IPv6 NAT-T.

We use IPv6 NAT-T to avoid IPsec slowing down caused by dropping ESP packets
by some Customer Premises Equipments (CPE). I implement ATF to test such
situation.

I think it can also work with nat66, but I have not tested to the fine details.
 1.1 30-Oct-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.6;
Add test cases of NAT-T (transport mode)

A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
 1.1.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.6.1 10-Jun-2019  christos Sync with HEAD
 1.1.4.1 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.1.2.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #357):
distrib/sets/lists/debug/mi: 1.228
distrib/sets/lists/tests/mi: 1.765-1.766
etc/mtree/NetBSD.dist.tests: 1.149
sys/net/npf/npf_ctl.c: 1.49
tests/net/ipsec/Makefile: 1.10
tests/net/ipsec/algorithms.sh: 1.6
tests/net/ipsec/natt_terminator.c: 1.1
tests/net/ipsec/t_ipsec_natt.sh: 1.1
tests/net/net_common.sh: 1.23-1.24
usr.sbin/npf/npfctl/npfctl.c: 1.54
Handle esp-udp for NAT-T
--
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
--
Fix showing translated port (ntohs-ed twice wrongly)
--
Add test cases of NAT-T (transport mode)
A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
--
Add net/ipsec debug lib directory
--
Add ./usr/libdata/debug/usr/tests/net/ipsec
--
Stop using bpfjit
Because most architectures don't support it and npf still works without it.
 1.1.2.1 30-Oct-2017  snj file t_ipsec_natt.sh was added on branch netbsd-8 on 2017-11-17 20:43:11 +0000
 1.2.2.1 10-Nov-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1129):

tests/net/if_ipsec/t_ipsec_pfil.sh: revision 1.3
tests/net/if_ipsec/t_ipsec.sh: revision 1.11
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/if_ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/t_ipsec_natt.sh: revision 1.4
tests/net/ipsec/t_ipsec_natt.sh: revision 1.5
tests/net/ipsec/common.sh: revision 1.8

Typo in error message

Refactor a little and follow new format of "npfctl list".

Fix the below ATF failures.
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_null
- net/if_ipsec/t_ipsec_natt:ipsecif_natt_transport_rijndaelcbc
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_null
- net/ipsec/t_ipsec_natt:ipsec_natt_transport_ipv4_rijndaelcbc
ok'ed by ozaki-r@n.o, thanks.

Fix missing "-m tranport" options. Pointed out by k-goda@IIJ.

Using any mode SA causes unepected call path, that is,
ipsec4_common_input_cb() calls ip_input() directly instead of
ipsecif4_input().
 1.2 03-Aug-2017  ozaki-r branches: 1.2.2;
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.1 02-Aug-2017  ozaki-r Add test cases for setsockopt(IP_IPSEC_POLICY)
 1.2.2.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.2.2.1 03-Aug-2017  snj file t_ipsec_sockopt.sh was added on branch netbsd-8 on 2017-10-21 19:43:55 +0000
 1.1 11-Oct-2022  knakahara Add test for sadb_x_policy->sadb_x_policy_flags.
 1.1 14-Apr-2017  ozaki-r branches: 1.1.2; 1.1.4;
Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.1.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.1.4.1 14-Apr-2017  pgoyette file t_ipsec_sysctl.sh was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.1.2.2 14-Apr-2017  ozaki-r 79006
 1.1.2.1 14-Apr-2017  ozaki-r file t_ipsec_sysctl.sh was added on branch bouyer-socketcan on 2017-04-14 02:56:50 +0000
 1.2 03-Aug-2017  ozaki-r branches: 1.2.2;
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.1 18-Jul-2017  ozaki-r branches: 1.1.2;
Separate test files
 1.1.2.2 18-Jul-2017  ozaki-r 301908
 1.1.2.1 18-Jul-2017  ozaki-r file t_ipsec_tcp.sh was added on branch perseant-stdc-iso10646 on 2017-07-18 02:16:08 +0000
 1.2.2.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.2.2.1 03-Aug-2017  snj file t_ipsec_tcp.sh was added on branch netbsd-8 on 2017-10-21 19:43:55 +0000
 1.8 19-Jun-2023  knakahara Repair test coverage. I revert by proxy as the committer seems too busy to even reply mail.

TODO:
Provide some way for small machines to run subset test so that they get
shorter run time at the expense of test coverage.
 1.7 04-Jun-2023  chs The ATF design is O(N^2) in the number of TCs in one TP, which on some
slower platforms causes the net/ipsec tests to take as much as 30% of
the total time to run all of the ATF tests. Reduce the number of TCs
in various net/ipsec TPs by iterating over *_ALGORITHMS_MINIMUM rather
than *_ALGORITHMS. Various of the net/ipsec tests already use the
smaller lists, so change the rest of them to do so as well.
 1.6 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.5 03-Jul-2017  ozaki-r Add test cases for IPComp
 1.4 12-May-2017  ozaki-r branches: 1.4.2;
Dedup some routines
 1.3 10-May-2017  ozaki-r Introduce check_sa_entries to remove lots of duplicated codes
 1.2 09-May-2017  ozaki-r Test flushing SAD/SPD entries
 1.1 14-Apr-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.6;
Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.1.6.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.1.6.1 11-May-2017  pgoyette Sync with HEAD
 1.1.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.1.4.1 14-Apr-2017  pgoyette file t_ipsec_transport.sh was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.1.2.2 14-Apr-2017  ozaki-r 79006
 1.1.2.1 14-Apr-2017  ozaki-r file t_ipsec_transport.sh was added on branch bouyer-socketcan on 2017-04-14 02:56:50 +0000
 1.4.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.11 19-Jun-2023  knakahara Repair test coverage. I revert by proxy as the committer seems too busy to even reply mail.

TODO:
Provide some way for small machines to run subset test so that they get
shorter run time at the expense of test coverage.
 1.10 04-Jun-2023  chs The ATF design is O(N^2) in the number of TCs in one TP, which on some
slower platforms causes the net/ipsec tests to take as much as 30% of
the total time to run all of the ATF tests. Reduce the number of TCs
in various net/ipsec TPs by iterating over *_ALGORITHMS_MINIMUM rather
than *_ALGORITHMS. Various of the net/ipsec tests already use the
smaller lists, so change the rest of them to do so as well.
 1.9 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.8 12-May-2017  ozaki-r branches: 1.8.2;
Dedup some routines
 1.7 10-May-2017  ozaki-r Disable DAD rather than waiting its completion every time
 1.6 10-May-2017  ozaki-r Dedup some routines
 1.5 10-May-2017  ozaki-r Introduce check_sa_entries to remove lots of duplicated codes
 1.4 09-May-2017  ozaki-r Test flushing SAD/SPD entries
 1.3 16-Apr-2017  ozaki-r branches: 1.3.2; 1.3.4; 1.3.6;
Revert "Mark tests of tunnel/AH/IPv6 as expected failure (PR kern/52161)"

The issue was fixed by christos@
 1.2 14-Apr-2017  ozaki-r Mark tests of tunnel/AH/IPv6 as expected failure (PR kern/52161)
 1.1 14-Apr-2017  ozaki-r Add tests for ipsec

- Check if setkey correctly handles algorithms for AH/ESP
- Check IPsec of transport mode with AH/ESP over IPv4/IPv6
- Check IPsec of tunnel mode with AH/ESP over IPv4/IPv6
 1.3.6.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.3.6.1 11-May-2017  pgoyette Sync with HEAD
 1.3.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.3.4.1 16-Apr-2017  pgoyette file t_ipsec_tunnel.sh was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.3.2.2 16-Apr-2017  ozaki-r 1922998
 1.3.2.1 16-Apr-2017  ozaki-r file t_ipsec_tunnel.sh was added on branch bouyer-socketcan on 2017-04-16 10:34:50 +0000
 1.8.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.4 19-Jun-2023  knakahara Repair test coverage. I revert by proxy as the committer seems too busy to even reply mail.

TODO:
Provide some way for small machines to run subset test so that they get
shorter run time at the expense of test coverage.
 1.3 04-Jun-2023  chs The ATF design is O(N^2) in the number of TCs in one TP, which on some
slower platforms causes the net/ipsec tests to take as much as 30% of
the total time to run all of the ATF tests. Reduce the number of TCs
in various net/ipsec TPs by iterating over *_ALGORITHMS_MINIMUM rather
than *_ALGORITHMS. Various of the net/ipsec tests already use the
smaller lists, so change the rest of them to do so as well.
 1.2 03-Aug-2017  ozaki-r branches: 1.2.2;
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.1 03-Jul-2017  ozaki-r Add test cases for IPComp
 1.2.2.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.2.2.1 03-Aug-2017  snj file t_ipsec_tunnel_ipcomp.sh was added on branch netbsd-8 on 2017-10-21 19:43:55 +0000
 1.5 19-Jun-2023  knakahara Repair test coverage. I revert by proxy as the committer seems too busy to even reply mail.

TODO:
Provide some way for small machines to run subset test so that they get
shorter run time at the expense of test coverage.
 1.4 04-Jun-2023  chs The ATF design is O(N^2) in the number of TCs in one TP, which on some
slower platforms causes the net/ipsec tests to take as much as 30% of
the total time to run all of the ATF tests. Reduce the number of TCs
in various net/ipsec TPs by iterating over *_ALGORITHMS_MINIMUM rather
than *_ALGORITHMS. Various of the net/ipsec tests already use the
smaller lists, so change the rest of them to do so as well.
 1.3 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.2 12-May-2017  ozaki-r branches: 1.2.2;
Dedup some routines
 1.1 10-May-2017  ozaki-r branches: 1.1.2;
Test tunnel mode with IPv4 over IPv6 and IPv6 over IPv4
 1.1.2.3 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.1.2.2 11-May-2017  pgoyette Sync with HEAD
 1.1.2.1 10-May-2017  pgoyette file t_ipsec_tunnel_odd.sh was added on branch prg-localcount2 on 2017-05-11 02:58:42 +0000
 1.2.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.3 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.2 26-May-2015  ozaki-r branches: 1.2.2;
Run mcast tests on rump kernels

The tests on anita qemus failed due to that the host network environment
didn't meet the tests.

The change makes the tests independent from host environments
and the tests will pass on any environments including anita qemus.

Discussed on tech-kern and tech-net.
 1.1 11-Oct-2014  christos add a multicast test (what to do with v6?)
 1.2.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.4 28-Feb-2017  ozaki-r Add tests that destroy an interface while the mcast program is running
 1.3 28-May-2015  ozaki-r branches: 1.3.2; 1.3.4;
Make the test stable under load or when running on a slow machine

Let sender and receiver synchronize explicitly via a socketpair
and don't rely on sleep.
 1.2 28-May-2015  ozaki-r Detail an error message
 1.1 26-May-2015  ozaki-r Run mcast tests on rump kernels

The tests on anita qemus failed due to that the host network environment
didn't meet the tests.

The change makes the tests independent from host environments
and the tests will pass on any environments including anita qemus.

Discussed on tech-kern and tech-net.
 1.3.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.3.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.17 26-May-2015  ozaki-r Run mcast tests on rump kernels

The tests on anita qemus failed due to that the host network environment
didn't meet the tests.

The change makes the tests independent from host environments
and the tests will pass on any environments including anita qemus.

Discussed on tech-kern and tech-net.
 1.16 25-May-2015  ozaki-r Fix specifying an interface for IPV6_JOIN_GROUP

Using always an interface of index=1 is not good idea; it varies
depending on runtime environments. We can use index=0 instead,
which allows the kernel to pick an appropriate interface for mcast.
 1.15 19-May-2015  ozaki-r Handle child's exit status precisely
 1.14 19-May-2015  ozaki-r Use EXIT_FAILURE instead of 1
 1.13 18-May-2015  ozaki-r KNF

Tweaks of whitespaces and tabs.
 1.12 17-May-2015  ozaki-r Save errno for errx
 1.11 27-Feb-2015  martin Bump timeout for a poll() call slightly, so the test has a chance to work
on slow machines.
 1.10 27-Oct-2014  christos fix typo, use different address
 1.9 26-Oct-2014  christos - deal with MacOS/X not having clock_*()
- change multicast address
- set the interface
XXX: Now the ipv6 code works on MacOS/X but does not work for us still.
 1.8 13-Oct-2014  martin timespec.tv_nsec is long, so use %ld instead of %jd as printf format.
 1.7 13-Oct-2014  christos typo
 1.6 13-Oct-2014  christos Oops need to bind, also make the message more interesting and check that
it arrives correctly.
 1.5 12-Oct-2014  christos Explain what works, what does not and why.
Provide compatible code so that it compiles on Linux and MacOS/X with -DTEST.
We should check more OS's and see if they are broken too.
 1.4 12-Oct-2014  christos now we support the v6 ioctls for mapped addresses too.
 1.3 12-Oct-2014  christos Explain a bit more what's going on with the multicast setsockopts.
 1.2 12-Oct-2014  christos Add the simple unconnected tests too.
 1.1 11-Oct-2014  christos add a multicast test (what to do with v6?)
 1.6 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.5 28-Feb-2017  ozaki-r branches: 1.5.4;
Add tests that destroy an interface while the mcast program is running
 1.4 25-Nov-2016  ozaki-r branches: 1.4.2;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.3 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.2 10-Aug-2016  kre + -lrumpdev
 1.1 26-May-2015  ozaki-r branches: 1.1.2;
Run mcast tests on rump kernels

The tests on anita qemus failed due to that the host network environment
didn't meet the tests.

The change makes the tests independent from host environments
and the tests will pass on any environments including anita qemus.

Discussed on tech-kern and tech-net.
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.4.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.5.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.7 01-Apr-2020  christos factor out common code and set the path.
 1.6 27-May-2015  kefren branches: 1.6.16;
Add another simple MPLS test but using this time a mixed IPv4/IPv6 LSR
This test encapsulates IPv6 packets, pass them over MPLS to an IPv6
neighbour that switches label and passes forward to an IPv4
neighbour. There, the IPv6 packet is decapsulated and passed to IPv6 stack
For the return path we test both implicit and explicit null encapsulations
 1.5 27-May-2015  kefren Add a simple IPv6/MPLS test
 1.4 12-Nov-2013  kefren branches: 1.4.4; 1.4.8;
Retire t_ldp_static. It's too heavy weighted for releng's anita and mostly
unsuited for atf.
 1.3 25-Jul-2013  kefren add a couple of tests for dynamic MPLS routes generation using ldpd
 1.2 23-Jul-2013  kefren branches: 1.2.2;
Add a test for RFC4182
 1.1 19-Jul-2013  kefren Add a couple of basic IP/MPLS forwarding tests
 1.2.2.2 23-Jul-2013  riastradh sync with HEAD
 1.2.2.1 23-Jul-2013  riastradh file Makefile was added on branch riastradh-drm2 on 2013-07-23 21:07:38 +0000
 1.4.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.8.1 12-Nov-2013  tls file Makefile was added on branch tls-maxphys on 2014-08-20 00:04:52 +0000
 1.4.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.4.1 12-Nov-2013  yamt file Makefile was added on branch yamt-pagecache on 2014-05-22 11:42:22 +0000
 1.6.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1 01-Apr-2020  christos branches: 1.1.2;
factor out common code and set the path.
 1.1.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1.2.1 01-Apr-2020  martin file mpls_common.sh was added on branch phil-wifi on 2020-04-08 14:09:12 +0000
 1.11 25-Nov-2021  hannken Consistently use "drvctl -l qemufwcfg0" to check if
running under qemu in general.
 1.10 01-Apr-2020  christos more cleanup
 1.9 01-Apr-2020  christos factor out common code and set the path.
 1.8 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.7 10-Aug-2016  ozaki-r branches: 1.7.14;
Add rumpdev library for config_cfdriver_attach
 1.6 13-May-2015  martin Before pinging, wait for addresses to come out of tentative state
 1.5 04-May-2015  martin Cosmetics: hide an error message from sysctl (machdep.cpu_brand is not
available on most architectures)
 1.4 01-Sep-2014  gson The t_ldp_regen test frequently fails under qemu, but reliably passes
on real hardware. The failures are hardly surprising given that qemu
timing is off by a is off by a factor of two as reported in
PR kern/43997. Disabling the test on qemu for now; it should be
re-enabled once 43997 has been fixed to see if it still fails then.
 1.3 03-Jan-2014  pooka branches: 1.3.4; 1.3.8;
ldpd wants inet6
 1.2 27-Jul-2013  kefren don't expect failure anymore
 1.1 25-Jul-2013  kefren add a couple of tests for dynamic MPLS routes generation using ldpd
 1.3.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.8.1 03-Jan-2014  tls file t_ldp_regen.sh was added on branch tls-maxphys on 2014-08-20 00:04:52 +0000
 1.3.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.4.1 03-Jan-2014  yamt file t_ldp_regen.sh was added on branch yamt-pagecache on 2014-05-22 11:42:22 +0000
 1.7.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.7.14.1 10-Jun-2019  christos Sync with HEAD
 1.3 12-Nov-2013  kefren Retire t_ldp_static. It's too heavy weighted for releng's anita and mostly
unsuited for atf.
 1.2 27-Jul-2013  kefren Disable ip forwarding at penultimate hop
Raise the ping wait time a little bit in order for changes to propagate
over ldp
 1.1 25-Jul-2013  kefren add a couple of tests for dynamic MPLS routes generation using ldpd
 1.7 01-Apr-2020  christos factor out common code and set the path.
 1.6 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.5 10-Aug-2016  ozaki-r branches: 1.5.14;
Add rumpdev library for config_cfdriver_attach
 1.4 18-Mar-2014  riastradh branches: 1.4.4; 1.4.8;
Merge riastradh-drm2 to HEAD.
 1.3 23-Jul-2013  kefren branches: 1.3.2;
Use . for shm bus path and rump_server url instead of /tmp
Exit after first reply in mpls_fw
 1.2 23-Jul-2013  martin Move all shm files from /tmp into . so ATF can automatically do cleanup
for us. Ok: kefren
 1.1 19-Jul-2013  kefren Add a couple of basic IP/MPLS forwarding tests
 1.3.2.2 23-Jul-2013  riastradh sync with HEAD
 1.3.2.1 23-Jul-2013  riastradh file t_mpls_fw.sh was added on branch riastradh-drm2 on 2013-07-23 21:07:38 +0000
 1.4.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.8.1 18-Mar-2014  tls file t_mpls_fw.sh was added on branch tls-maxphys on 2014-08-20 00:04:52 +0000
 1.4.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.4.1 18-Mar-2014  yamt file t_mpls_fw.sh was added on branch yamt-pagecache on 2014-05-22 11:42:22 +0000
 1.5.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.5.14.1 10-Jun-2019  christos Sync with HEAD
 1.5 01-Apr-2020  christos factor out common code and set the path.
 1.4 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.3 10-Aug-2016  ozaki-r branches: 1.3.14;
Add rumpdev library for config_cfdriver_attach
 1.2 07-Aug-2015  ozaki-r Use rump.ping6 instead of ping6 with rumphijack(3)
 1.1 27-May-2015  kefren Add a simple IPv6/MPLS test
 1.3.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.14.1 10-Jun-2019  christos Sync with HEAD
 1.5 01-Apr-2020  christos factor out common code and set the path.
 1.4 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.3 10-Aug-2016  ozaki-r branches: 1.3.14;
Add rumpdev library for config_cfdriver_attach
 1.2 07-Aug-2015  ozaki-r Use rump.ping6 instead of ping6 with rumphijack(3)
 1.1 27-May-2015  kefren Add another simple MPLS test but using this time a mixed IPv4/IPv6 LSR
This test encapsulates IPv6 packets, pass them over MPLS to an IPv6
neighbour that switches label and passes forward to an IPv4
neighbour. There, the IPv6 packet is decapsulated and passed to IPv6 stack
For the return path we test both implicit and explicit null encapsulations
 1.3.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.14.1 10-Jun-2019  christos Sync with HEAD
 1.6 01-Apr-2020  christos factor out common code and set the path.
 1.5 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.4 10-Aug-2016  ozaki-r branches: 1.4.14;
Add rumpdev library for config_cfdriver_attach
 1.3 18-Mar-2014  riastradh branches: 1.3.4; 1.3.8;
Merge riastradh-drm2 to HEAD.
 1.2 23-Jul-2013  kefren branches: 1.2.2;
Use . for shm bus path and rump_server url instead of /tmp
Exit after first reply in mpls_fw
 1.1 23-Jul-2013  kefren Add a test for RFC4182
 1.2.2.2 23-Jul-2013  riastradh sync with HEAD
 1.2.2.1 23-Jul-2013  riastradh file t_rfc4182.sh was added on branch riastradh-drm2 on 2013-07-23 21:07:38 +0000
 1.3.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.8.1 18-Mar-2014  tls file t_rfc4182.sh was added on branch tls-maxphys on 2014-08-20 00:04:52 +0000
 1.3.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.4.1 18-Mar-2014  yamt file t_rfc4182.sh was added on branch yamt-pagecache on 2014-05-22 11:42:22 +0000
 1.4.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4.14.1 10-Jun-2019  christos Sync with HEAD
 1.4 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.3 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.2 11-Nov-2015  ozaki-r branches: 1.2.2;
Add tests for RA

From s-yamaguchi@IIJ (with some tweaks by me)
 1.1 03-Aug-2015  ozaki-r Add tests for NDP
 1.2.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.14 07-Mar-2018  ozaki-r Tweak tests; increase the size of NS packets for the addition of a nonce
 1.13 07-Mar-2018  ozaki-r Provide more informative reports on failures
 1.12 25-Nov-2016  ozaki-r branches: 1.12.12;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.11 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.10 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.9 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.8 04-Oct-2016  ozaki-r Add tests for sysctl net.inet6.ip6.dad_count

From suzu-ken@IIJ
 1.7 16-Sep-2016  ozaki-r Ignore case in duplicated
 1.6 16-Sep-2016  ozaki-r Adjust for new ifconfig output

And use -o match to provide informative error messages.
 1.5 10-Aug-2016  kre + -lrumpdev
 1.4 24-Aug-2015  ozaki-r branches: 1.4.2;
Disable another tentative state check

It's too ephemeral to check robustly.
 1.3 17-Aug-2015  ozaki-r Improve test stability

- Take a diff between packet dumps and use it for packet checking
- it's resistant against packet reorder
- Seep 2 sec to make sure a NS message is sent
- Disable tentative state check for now
- it's too ephemeral to check robustly
 1.2 10-Aug-2015  ozaki-r Fix cleanup
 1.1 03-Aug-2015  ozaki-r Add tests for NDP
 1.4.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.4.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.12.12.1 15-Mar-2018  pgoyette Synch with HEAD
 1.40 07-Jan-2022  ozaki-r tests: skip ndp_cache_state on qemu
 1.39 17-Sep-2020  roy ndp_rtm: Only ping once

Pointless doing 3 pings.
On a slow system, it's possible that many RTM_MISS messages could
overflow into the next test.
 1.38 15-Mar-2020  roy tests: ndp_rtm: Check correct source address in RTA_AUTHOR

Fixes PR kern/55074.
 1.37 11-Mar-2020  roy tests: check RTA_AUTHOR in messages
 1.36 03-Sep-2019  roy tests: fix ARP and NDP tests for RTM_* messages

While here add tests for RTM_MISS.
 1.35 19-Aug-2019  ozaki-r tests: fix test header name
 1.34 13-Aug-2019  ozaki-r Make a permanet neighbor cache to avoid sending an NS packet disturbing the test

A receiver of an ICMPv6 request packet creates a stale cache entry and it turns
into the delay state on replying the packet. After 5 second, the receiver sends
an NS packet as a reachability confirmation, which disturbs the test and causes
a unexpected result.

Should fix PR misc/54451
 1.33 18-Jul-2019  ozaki-r branches: 1.33.2;
tests: shorten the expire time of neighbor caches to reduce the runtime of the tests
 1.32 28-Jun-2019  ozaki-r tests: test state transitions of neighbor caches
 1.31 22-Mar-2018  ozaki-r branches: 1.31.2;
Avoid setting IP addresses of the same subnet on different interface

If we do so, there will remain one route that is of a preceding address, but
that behavior is not documented and may be changed in the future. Tests
shouldn't rely on such a unstable behavior.
 1.30 24-Nov-2017  kre branches: 1.30.2;

Fix the ndp_rtm test the same way the arp_rtm test was fixed:
1. get pid of bg process with $! not $?
2. expect a single message from route monitor, not two, after ndp -d
3. run atf_check just once to verify correct output, not once for each string
 1.29 28-Jun-2017  ozaki-r Enable to remove multiple ARP/NDP entries for one destination

The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.

arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.

Related to PR 51179
 1.28 28-Jun-2017  ozaki-r Restore ARP/NDP entries to route show and netstat -r

Requested by dyoung@ some time ago
 1.27 26-Jun-2017  ozaki-r Drop RTF_UP from a routing message of a deleted ARP/NDP entry
 1.26 26-Jun-2017  ozaki-r Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry

A message originally included only DST and GATEWAY. Restore it.
 1.25 26-Jun-2017  ozaki-r Fix usage of routing messages on arp -d and ndp -d

It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
 1.24 22-Jun-2017  ozaki-r Test implicit removals of ARP/NDP entries

One test case reproudces PR 51179.
 1.23 22-Jun-2017  ozaki-r Fix typo
 1.22 21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.21 19-Jun-2017  ozaki-r Add missing declarations for cleanup
 1.20 16-Jun-2017  ozaki-r Test routing messages emitted on operations of ARP/NDP entries
 1.19 26-May-2017  ozaki-r branches: 1.19.2;
Change the default value of DEBUG of stable tests to false
 1.18 03-Mar-2017  ozaki-r Provide a more robust regexp for time formats of 1day-ish
 1.17 25-Nov-2016  ozaki-r branches: 1.17.2;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.16 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.15 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.14 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.13 10-Aug-2016  kre + -lrumpdev
 1.12 21-Jun-2016  ozaki-r branches: 1.12.2;
Make a bunch of test names self-descriptive
 1.11 20-May-2016  ozaki-r Adjust the tests for temp option that works now

See PR kern/50127
 1.10 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.9 29-Feb-2016  ozaki-r Add tests on activating a new MAC address
 1.8 18-Nov-2015  ozaki-r Don't assign unused IP address

It sometimes creates an unexpected NDP cache.
 1.7 17-Nov-2015  ozaki-r Add tests for GC of neighbor caches
 1.6 18-Aug-2015  ozaki-r Make a test a bit easy

Accept just 24h of expiration time in addition to 24h - a few seconds.
 1.5 17-Aug-2015  ozaki-r Improve test stability

A test for ndp -c was sometimes failed because between the deletion
and the check NS/NA messages were exchanged and a NDP cache was
recreated unexpectedly. To provent this situation, we do ifconfig
shmif0 down of the peer before the test, so the test won't be
interfered by the messages.
 1.4 10-Aug-2015  ozaki-r Fix head and cleanup definitions for cache_expiration tests
 1.3 07-Aug-2015  ozaki-r Use rump.ping6 instead of ping6 with rumphijack(3)
 1.2 04-Aug-2015  ozaki-r Check the output of ndp -d strictly
 1.1 03-Aug-2015  ozaki-r Add tests for NDP
 1.12.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.12.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.17.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.2.3 08-Jul-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1285):

sys/netinet6/nd6.c: revision 1.255
tests/net/ndp/t_ndp.sh: revision 1.32

nd6: restore a missing reachability confirmation

On sending a packet over a STALE cache, the cache should be tried a reachability
confirmation, which is described in RFC 2461/4861 7.3.3. On the fast path in
nd6_resolve, however, the treatment for STALE caches has been skipped
accidentally. So STALE caches never be back to the REACHABLE state.

To fix the issue, branch to the fast path only when the cache entry is the
REACHABLE state and leave other caches to the slow path that includes the
treatment. To this end we need to allow to return a link-layer address if a
valid address is available on the slow path too, which is the same behavior as
FreeBSD and OpenBSD.

tests: test state transitions of neighbor caches
 1.19.2.2 02-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #688):

tests/net/ndp/t_ndp.sh: revision 1.31
tests/net/if_tap/t_tap.sh: revision 1.8

Avoid setting IP addresses of the same subnet on different interface

If we do so, there will remain one route that is of a preceding address, but
that behavior is not documented and may be changed in the future. Tests
shouldn't rely on such a unstable behavior.
 1.19.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.30.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.31.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.33.2.2 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #175):

tests/net/arp/t_arp.sh: revision 1.39
tests/net/ndp/t_ndp.sh: revision 1.36

tests: fix ARP and NDP tests for RTM_* messages

While here add tests for RTM_MISS.
 1.33.2.1 26-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #117):

tests/net/ndp/t_ndp.sh: revision 1.34

Make a permanet neighbor cache to avoid sending an NS packet disturbing the test

A receiver of an ICMPv6 request packet creates a stale cache entry and it turns
into the delay state on replying the packet. After 5 second, the receiver sends
an NS packet as a reachability confirmation, which disturbs the test and causes
a unexpected result.

Should fix PR misc/54451
 1.34 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.33 16-Oct-2019  ozaki-r tests: add tests for the validateion of net.inet6.ip6.temppltime
 1.32 25-Nov-2017  kre branches: 1.32.4; 1.32.6;
Make this test somewhat deterministic - far fewer races, and most
of what are left are "race for the bus" type - if we lose, we just
wait for the next one ... slower but still reliable.

There are two exceptions ... when starting more than one rtadvd
(on different routers) we expect to receive an RA from each, but
all that we can check is that we received the (at least) right number
of RAs. It is possible (though unlikely) that one router sent two
before another sent any, in which case we will not have the data we
expect, and a sub-test will fail.

Second, there is no way to know for sure that we have waited long
enough when we're waiting for data to expire - in systems with
correctly working clocks that actually measure time, this should not
be an issue, if data is due to expire in < 5 seconds, and we wait
5 seconds, and the data is still there, then that indicates a
failure, which should be detected. Unfortunately with QEMU testing
time just isn't that reliable. But fortunately, it is generally the
sleep which takes longer, while other timers run correctly, which is
the way that makes us happy...

While here lots of cleanups - everything from white space and
line wrapping, to removing superfluous quotes and adding some
(but probably not enough) that are not (though given the data is
all known here, lack of quotes will rarely hurt.)

Also take note of the fact that current rtadvd *cannot* delete its
pidfile, so waiting for that file to be removed is doomed to failure.
Do things in a way that works, rather than simply resorting to assassination.

Because we do a lot less "sleep and hope it is long enough" and more
"wait until it is observed to happen" the tests generally run in less
elapsed time than before (20% less has been observed.) But because we
"wait until it is observed to happen" rather than just "sleep and hope
it is long enough" sometimes things take longer (and when that happens,
we no longer fail). Up to 7% slower (overall) has been observed.
(Observations on an amd64 DomU, no idea yet as to what QEMU might observe.)
 1.31 07-Nov-2017  ozaki-r Let rtadvd not use syslog for logging

Thanks to christos@ now rtadvd can log via stderr instead of syslog
by -D option.

Address PR bin/52701
 1.30 06-Nov-2017  ozaki-r Kill rtadvd surely even if the tests fail in the middle

It may help PR bin/52701.
 1.29 22-Jun-2017  ozaki-r Purge all related L2 caches on removing a route

The change addresses situations similar to PR 51179.
 1.28 21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.27 26-May-2017  ozaki-r branches: 1.27.2;
Change the default value of DEBUG of stable tests to false
 1.26 03-Mar-2017  ozaki-r Provide a more robust regexp for time formats of 1day-ish
 1.25 22-Feb-2017  ozaki-r Add tests for expiration of default router and prefix entries
 1.24 13-Jan-2017  ozaki-r branches: 1.24.2;
Remove a check added wrongly
 1.23 13-Jan-2017  ozaki-r Add tests for net.inet6.ip6.prefer_tempaddr
 1.22 13-Jan-2017  ozaki-r Remove extra checks and cleanup
 1.21 11-Jan-2017  ozaki-r Cope with tentative state
 1.20 11-Jan-2017  ozaki-r Add a test case for IPv6 temporary address
 1.19 11-Jan-2017  ozaki-r Check autoconf flag
 1.18 26-Dec-2016  ozaki-r Fix typo
 1.17 21-Dec-2016  ozaki-r Restore multiple_routers_single_prefix_cleanup removed wrongly
 1.16 20-Dec-2016  ozaki-r Reduce unnecessary wait
 1.15 19-Dec-2016  ozaki-r Add a test case for exceeding the number of maximum prefixes

The test case pinpoints purge_detached.
 1.14 19-Dec-2016  ozaki-r Add tests for multiple routers with a single prefix
 1.13 19-Dec-2016  ozaki-r Fix the description of a test
 1.12 16-Dec-2016  ozaki-r Add tests for multiple routers
 1.11 16-Dec-2016  ozaki-r Unify common routines
 1.10 16-Dec-2016  ozaki-r Avoid using /var/run/rump.rtadvd.pid
 1.9 16-Dec-2016  ozaki-r Add a test case that deletes auto-configured addresses
 1.8 16-Dec-2016  ozaki-r Improve stability of the tests

- Do ifconfig -w 10 after ifconfig up
- Accept /1d0h0m..s/ in addition to /23h59m..s/ for expiration time
- Prevent new RA messages from coming after flushing entries

The changes should fix flapping of test results on babylon5.
 1.7 14-Dec-2016  ozaki-r Add tests for flushing prefix and default router entries
 1.6 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.5 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.4 20-Oct-2016  ozaki-r Make test names self-descriptive
 1.3 10-Aug-2016  kre + -lrumpdev
 1.2 12-Nov-2015  ozaki-r branches: 1.2.2;
Fix up the header

Remove unnecessary shebang and add missing keyword expansion,
copyright and license.
 1.1 11-Nov-2015  ozaki-r Add tests for RA

From s-yamaguchi@IIJ (with some tweaks by me)
 1.2.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.2.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.24.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.27.2.2 21-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #358):
usr.sbin/rtadvd/rtadvd.c: revision 1.54-1.58
usr.sbin/rtadvd/config.c: revision 1.37
usr.sbin/rtadvd/if.c: revision 1.25
usr.sbin/rtadvd/dump.c: revision 1.15
usr.sbin/rtadvd/rrenum.c: revision 1.20
usr.sbin/rtadvd/logit.h: revision 1.1
usr.sbin/rtadvd/rtadvd.8: revision 1.26
tests/net/ndp/t_ra.sh: revision 1.30
usr.sbin/rtadvd/timer.c: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.31
usr.sbin/rtadvd/advcap.c: revision 1.17

KNF, fix time printing formats.

Don't start another copy of rtadvd if one is running.

PR/52618: Shinichi Doyashiki: Don't exit if adding interface to multicast
group fails. This happens with empty vlan interfaces
- make syscalls checks against -1
- fix errors to print %s: instead of <%s>
XXX: if_vlan is the only pseudo interface in net/ that returns such an
error..

Kill rtadvd surely even if the tests fail in the middle
It may help PR bin/52701.

Change the meaning of the D flag to print errors to stderr instead of
syslog(3) and exit if poll(2) fails (intended to be used with unit-tests).

Mark expandm as preserving format strings.

Let rtadvd not use syslog for logging
Thanks to christos@ now rtadvd can log via stderr instead of syslog
by -D option.
Address PR bin/52701
 1.27.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.32.6.1 23-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #368):

sys/netinet6/in6_ifattach.h: revision 1.14
sys/netinet6/ip6_input.c: revision 1.212
sys/netinet6/ip6_input.c: revision 1.213
sys/netinet6/ip6_input.c: revision 1.214
sys/netinet6/in6_var.h: revision 1.101
sys/netinet6/in6_var.h: revision 1.102
sys/netinet6/in6_ifattach.c: revision 1.116
sys/netinet6/in6_ifattach.c: revision 1.117
tests/net/ndp/t_ra.sh: revision 1.33

Reorganize in6_tmpaddrtimer stuffs
- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule

Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change
ip6_temp_preferred_lifetime is used to calculate an interval period to
regenerate temporary addresse by
TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR
as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE +
DESYNC_FACTOR), otherwise it will be negative and go wrong, for example
KASSERT(to_ticks >= 0) in callout_schedule_locked fails.

tests: add tests for the validateion of net.inet6.ip6.temppltime

in6: reset the temporary address timer on a change of the interval period
 1.32.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.29 01-Dec-2022  ozaki-r tests: fix Makefile and lists for MKRUMP=no

Pointed out by Michael Scholz, thanks.
 1.28 30-Nov-2022  ozaki-r tests: restore a line removed accidentally
 1.27 30-Nov-2022  ozaki-r tests: build and install t_ip_reass.c
 1.26 17-Nov-2022  ozaki-r tests: build and install added test files
 1.25 08-Sep-2020  christos Add tests for IP_BINDANY, IPV6_BINDANY
 1.24 06-Jul-2020  christos add a test for v4 mapped addresses
 1.23 01-Mar-2020  christos Centralize the base rump libraries into a variable used by all the other
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
 1.22 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.21 10-Aug-2017  ryo branches: 1.21.4;
Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.20 31-Mar-2017  ozaki-r branches: 1.20.4;
Add test cases for ping options
 1.19 24-Nov-2016  ozaki-r branches: 1.19.2;
Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.18 07-Nov-2016  ozaki-r Add basic tests for IPv6 Path MTU Discovery
 1.17 31-Oct-2016  ozaki-r Add tests for ping6 options

- -S <sourceaddr>
- -I <interface>
- -g <gateway>
 1.16 04-Oct-2016  ozaki-r Add tests for sysctl net.inet.ip.mtudisc

From suzu-ken@IIJ
 1.15 08-Aug-2016  pgoyette Another place where we need librumpdev
 1.14 12-Nov-2015  ozaki-r branches: 1.14.2;
Add tests of IPv6 link local address

From s-yamaguchi@IIJ
 1.13 07-Oct-2015  ozaki-r Add tests for assigining/deleting IP addresses

The tests help to find defects related to creation/deletion
of routes/llentries and assigning/deleting of IP addresses.
 1.12 06-Aug-2015  ozaki-r Add basic tests for IPv6 Address Lifetime Expiry
 1.11 22-Jun-2015  matt Don't build tests that depend on RUMP if BSD_MK_COMPAT_FILE is defined.
 1.10 20-May-2015  christos MKRUMP=no fixes (Robert Swindells)
 1.9 13-May-2015  ozaki-r Add basic tests for IP forwarding
 1.8 10-Jun-2014  he Fix static linking for the tests: -lrump is also used by -lrumpuser,
so we also need -lrump after -lrumpuser. Fixes build for sun2.
 1.7 12-Oct-2013  christos branches: 1.7.2;
new test to check if non-blocking sockets are reset to blocking on the
accepted file descriptor.
 1.6 03-Jul-2013  nakayama Enable tests which does not require rump if MKRUMP=no.
Pointed out by christos on source-changes-d.
 1.5 27-Jun-2013  christos add a pktinfo test
 1.4 06-Jan-2013  christos new udp test
 1.3 28-Sep-2011  christos branches: 1.3.2; 1.3.8;
add back the raw test and fix typo in the libraries.
 1.2 28-Sep-2011  christos Add a unix socket pathname size limit test.
 1.1 11-Jan-2011  pooka add test for PR kern/44369
 1.3.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.8.1 25-Feb-2013  tls resync with head
 1.3.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.2.1 23-Jan-2013  yamt sync with head
 1.7.2.1 10-Aug-2014  tls Rebase.
 1.14.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.14.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.14.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.19.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.20.4.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.21.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.21.4.1 10-Jun-2019  christos Sync with HEAD
 1.2 05-Aug-2023  riastradh tests/net/net/t_bind: IP_BINDANY and IPV6_BINDANY require root.
 1.1 08-Sep-2020  christos Add tests for IP_BINDANY, IPV6_BINDANY
 1.21 27-Jun-2025  andvar Grammar and spelling fixes, mainly in comments. A few in documentation,
logging, test description, and SCSI ASC/ASCQ assignment descriptions.
 1.20 20-Feb-2017  ozaki-r branches: 1.20.24;
Add basic tests for forwarding fragmented packets
 1.19 25-Nov-2016  ozaki-r branches: 1.19.2;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.18 25-Nov-2016  ozaki-r Add missing head functions
 1.17 24-Nov-2016  ozaki-r Share httpd start/stop code
 1.16 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.15 10-Aug-2016  kre + -lrumpdev
 1.14 29-Jun-2016  ozaki-r branches: 1.14.2;
Destroy interfaces at the end of tests

It's useful to know if interface destructions work correctly or not
with complex internal states (e.g., caches).
 1.13 21-Jun-2016  ozaki-r Make a bunch of test names self-descriptive
 1.12 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.11 25-Dec-2015  ozaki-r Add some tests for sysctl net.inet.ip.*

- net.inet.ip.redirect
- net.inet.ip.directed-broadcast (and net.inet.icmp.bmcastecho)
- net.inet.ip.ttl

From suzu-ken@IIJ (with tweaks by me)
 1.10 24-Nov-2015  ozaki-r Add fastforward6 test
 1.9 29-Sep-2015  ozaki-r Let ftp use a different output file from httpd's one

Previously the target file served by httpd and the output file of ftp
were identical (both index.html) on the filesystem.
 1.8 28-Sep-2015  ozaki-r Add simple tests for fastforward

The tests just do TCP communication via HTTP GET.
 1.7 04-Sep-2015  ozaki-r Add tests to check if nexthop route lookup works

These tests reproduce a panic on assertion "ro->_ro_rt ==NULL ||
ro->_ro_rt->rt_refcnt > 0" failure that had been fixed.
 1.6 07-Aug-2015  ozaki-r Use rump.ping6 instead of ping6 with rumphijack(3)
 1.5 02-Jun-2015  ozaki-r Check if tests surely failed with TTL exceeded
 1.4 29-May-2015  ozaki-r Bump timeout for ping and ping6 to 5 sec

Hope the wait is enough for slow machines, e.g., qemu/anita/i386.
 1.3 27-May-2015  ozaki-r Add timeout to ping6 positive tests too

For when they fail.
 1.2 16-May-2015  ozaki-r Enable IPv6 negative tests

As ping6 timeout feature (-X option) is added, we can do negative
tests without wasting time.

1 sec delay is added after network setup to avoid false positives.
 1.1 13-May-2015  ozaki-r Add basic tests for IP forwarding
 1.14.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.14.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.19.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.20.24.1 02-Aug-2025  perseant Sync with HEAD
 1.2 30-Nov-2022  ozaki-r tests: tweak t_ip_reass.c for NetBSD

The test is modified to run on rump kernels.
 1.1 30-Nov-2022  ozaki-r tests: import ip_reass_test.c from FreeBSD as t_ip_reass.c

As of:
commit 9ed1e4ecd4e9eb3bde16f52a937a6fa86a971638
Author: Mark Johnston <markj@FreeBSD.org>
Date: Tue Nov 20 18:13:18 2018 +0000

Plug a trivial memory leak.

CID: 1396911
MFC with: r340485
 1.11 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.10 20-Feb-2017  ozaki-r branches: 1.10.4;
Add simple tests of behaviors of alias addresses
 1.9 15-Dec-2016  ozaki-r branches: 1.9.2;
Fix that cleanup doesn't run when DEBUG=false
 1.8 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.7 24-Nov-2016  ozaki-r Move route check functions to net_common.sh
 1.6 24-Nov-2016  ozaki-r Make tests strict

Connected routes have 'C' flag.
 1.5 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.4 04-Oct-2016  ozaki-r Add tests for sysctl net.inet6.ip6.auto_linklocal

From suzu-ken@IIJ
 1.3 10-Aug-2016  kre + -lrumpdev
 1.2 04-Apr-2016  ozaki-r branches: 1.2.2;
Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.1 07-Oct-2015  ozaki-r Add tests for assigining/deleting IP addresses

The tests help to find defects related to creation/deletion
of routes/llentries and assigning/deleting of IP addresses.
 1.2.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.2.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.9.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.10.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.7 25-Nov-2021  hannken Consistently use "drvctl -l qemufwcfg0" to check if
running under qemu in general.
 1.6 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.5 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.4 02-Oct-2016  kre This test works fine on real hardware, but due to PR kern/43997 (qemu
timing problems) fails when run under qemu. Attempt to compensate
for that (by skipping the problematic test case) when running in qemu.

This should be reverted when the PR gets fixed (either in qemu or in
the NetBSD kernel).
 1.3 16-Sep-2016  ozaki-r Ignore case in deprecated
 1.2 10-Aug-2016  kre + -lrumpdev
 1.1 06-Aug-2015  ozaki-r branches: 1.1.2;
Add basic tests for IPv6 Address Lifetime Expiry
 1.1.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.16 26-Aug-2019  ozaki-r tests: add tests for IPv6 link-local addresses with a scope ID

Setting an address with a scope ID doesn't work for rump.ifconfig for some
reasons and needs $HIJACKING for now. The bug should be fixed someday.
 1.15 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.14 21-Jun-2017  ozaki-r branches: 1.14.6;
Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.13 26-May-2017  ozaki-r branches: 1.13.2;
Change the default value of DEBUG of stable tests to false
 1.12 14-Dec-2016  ozaki-r Rename dump because it's used in net_common.sh
 1.11 24-Nov-2016  ozaki-r Move get_lladdr to net_common.sh
 1.10 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.9 03-Oct-2016  kre 80 column violation fixes, hopefully minor readability improvements.
No intended functional change.
 1.8 02-Oct-2016  kre More adaptation to new ifconfig output format - prefix length is now
appended to the address, rather than a second parameter, so needs to be
deleted if just the bare address is what we want (which it is here).
 1.7 10-Aug-2016  kre + -lrumpdev
 1.6 13-Mar-2016  ozaki-r branches: 1.6.2;
Fix test
 1.5 13-Mar-2016  ozaki-r Add more debugging code
 1.4 12-Mar-2016  ozaki-r Add debugging code and enable it by default
to know what is happening on anita (qemu)
 1.3 15-Dec-2015  ozaki-r Add more tests for IPv6 link-local addresses

The tests include a test for PR 50549.
 1.2 19-Nov-2015  ozaki-r Set timeout of ping6 to reduce execution time
 1.1 12-Nov-2015  ozaki-r Add tests of IPv6 link local address

From s-yamaguchi@IIJ
 1.6.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.6.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.13.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.14.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.14.6.1 10-Jun-2019  christos Sync with HEAD
 1.1 06-Jul-2020  christos add a test for v4 mapped addresses
 1.11 21-May-2023  andvar s/thar/that/ in comments.
 1.10 06-Mar-2017  ozaki-r Add a test case for net.inet.ip.mtudisctimeout

The test case just reproduces PR kern/52029 and needs more tests.
 1.9 16-Feb-2017  ozaki-r Use nc instead of ftp/httpd

ftp with rumphijack is unstable probably because ftp uses siglongjmp from
a signal hander. So stop using ftp and use nc instead. This fixes test
failures of t_mtudisc on some environments such as my development machine
(amd64) and anita on sparc64.
 1.8 21-Dec-2016  ozaki-r branches: 1.8.2;
Suppress harmless warning message

rump.netstat: sysctlnametomib: net.inet6.udp6.pcblist: No such file or directory
 1.7 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.6 24-Nov-2016  ozaki-r Share httpd start/stop code
 1.5 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.4 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.3 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.2 06-Oct-2016  kre branches: 1.2.2;

PR bin/51532 - kill the test http server before terminating
 1.1 04-Oct-2016  ozaki-r Add tests for sysctl net.inet.ip.mtudisc

From suzu-ken@IIJ
 1.2.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.2.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.2.2.1 06-Oct-2016  pgoyette file t_mtudisc.sh was added on branch pgoyette-localcount on 2016-11-04 14:49:24 +0000
 1.8.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.5 25-Nov-2016  ozaki-r branches: 1.5.2;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.4 24-Nov-2016  ozaki-r Share httpd start/stop code
 1.3 24-Nov-2016  ozaki-r Move HIJACKING definition to net_common.sh
 1.2 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.1 07-Nov-2016  ozaki-r Add basic tests for IPv6 Path MTU Discovery
 1.5.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.5.2.1 25-Nov-2016  pgoyette file t_mtudisc6.sh was added on branch pgoyette-localcount on 2017-01-07 08:56:56 +0000
 1.9 26-Apr-2018  maxv Remove ping6_opts_hops, "-g" does not exist anymore (RH0 removed).
 1.8 25-Nov-2016  ozaki-r branches: 1.8.12;
Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.7 24-Nov-2016  ozaki-r Move get_macaddr to net_common.sh
 1.6 24-Nov-2016  ozaki-r Move get_lladdr to net_common.sh
 1.5 24-Nov-2016  ozaki-r Reduce duplicate codes

Introduce net_common.sh that is to share common functions used in tests
for networking. This commit commonizes extract_new_packets. Other duplicate
codes will be moved to the file in further commits.
 1.4 07-Nov-2016  ozaki-r Add tests for combination use of -g option and hops optional argument
 1.3 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.2 07-Nov-2016  ozaki-r Add tests of ping6 hops optional argument (Type 0 Routing Headers)

Note that ping6 with the argument can send packets with routing headers
but the kernel doesn't support receiving the packets so that ping6 fails.
Nevertheless, we can test whether sent packets are correct or not.
 1.1 31-Oct-2016  ozaki-r branches: 1.1.2;
Add tests for ping6 options

- -S <sourceaddr>
- -I <interface>
- -g <gateway>
 1.1.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.1.2.1 31-Oct-2016  pgoyette file t_ping6_opts.sh was added on branch pgoyette-localcount on 2016-11-04 14:49:24 +0000
 1.8.12.1 02-May-2018  pgoyette Synch with HEAD
 1.3 09-Feb-2018  ozaki-r Fix ping_opts_gateway and ping_opts_recordroute

We need to enable the options of source routing on all rump kernels.
 1.2 08-Feb-2018  maxv Now that we don't allow source-routed packets by default, set allowsrcrt=1
and forwsrcrt=1. Should fix the ATF failure.
 1.1 31-Mar-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.8;
Add test cases for ping options
 1.1.8.1 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #571):
tests/net/net/t_ping_opts.sh: 1.2-1.3
Now that we don't allow source-routed packets by default, set allowsrcrt=1
and forwsrcrt=1. Should fix the ATF failure.
--
Fix ping_opts_gateway and ping_opts_recordroute
We need to enable the options of source routing on all rump kernels.
 1.1.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.1.4.1 31-Mar-2017  pgoyette file t_ping_opts.sh was added on branch pgoyette-localcount on 2017-04-26 02:53:34 +0000
 1.1.2.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.2.1 31-Mar-2017  bouyer file t_ping_opts.sh was added on branch bouyer-socketcan on 2017-04-21 16:54:13 +0000
 1.2 19-Oct-2013  christos branches: 1.2.4; 1.2.8;
fix unused variable warnings
 1.1 27-Jun-2013  christos add a pktinfo test
 1.2.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.8.1 19-Oct-2013  tls file t_pktinfo.c was added on branch tls-maxphys on 2014-08-20 00:04:52 +0000
 1.2.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.4.1 19-Oct-2013  yamt file t_pktinfo.c was added on branch yamt-pagecache on 2014-05-22 11:42:22 +0000
 1.3 30-Dec-2017  gson Use the default ATF timeout instead of a reduced one of 5 seconds,
which isn't enough for pmax under gxemul on babylon5.netbsd.org.
 1.2 11-Dec-2017  ryo branches: 1.2.2;
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.1 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.2.2.2 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.2.2.1 11-Dec-2017  snj file t_pktinfo_send.c was added on branch netbsd-8 on 2017-12-21 21:08:13 +0000
 1.2 13-Jan-2017  christos Don't play with "../.." in includes for h_macros.h; deal with it centrally.
Minor fixes.
 1.1 11-Jan-2011  pooka branches: 1.1.28;
add test for PR kern/44369
 1.1.28.1 20-Mar-2017  pgoyette Sync with HEAD
 1.2 17-Nov-2022  ozaki-r tests: make t_socket_afinet.c run on rump kernel
 1.1 17-Nov-2022  ozaki-r tests: import socket_afinet.c from FreeBSD as t_socket_afinet.c

As of:
commit 3aaaa2efde896e19d229ee2cf09fe7e6ab0fbf6e
Author: Thomas Munro <tmunro@FreeBSD.org>
Date: Wed Apr 28 21:31:38 2021 +1200

poll(2): Add POLLRDHUP.
 1.13 23-Aug-2024  rin tests: Fix false positives due to race b/w connect(2) and accept(2)

For kernel/kqueue/t_empty and net/net/t_tcp, there were no sync ops
b/w connect(2) and accept(2) for non-blocking socket pair on host
(rump is not used).

As a result, accept(2) can fail immediately with EAGAIN, when
kernel-side routines for connect(2) and accept(2) are processed in
different CPU cores.

1-sec sleep(3) between two syscalls seems to mitigate this problem
as far as I can see, although this should not be a perfect solution...

Thanks ozaki-r@ for discussion.
 1.12 08-Nov-2021  rin branches: 1.12.2; 1.12.4;
Fix (a kind of) violation of strict aliasing rule.

Due to the rule, "sin" and "sin6" can be treated as restrict pointers.
Compilers seem to be confused by structure copy for those pointed by
them before assignments.

For aarch64eb, GCC 9 and 10 compile t_tcp.c rev 1.11 into a code, where
fetch for "sin6->sin6_port" is preceding the structure copy "ss = bs".
This results in failure of connect(2) with EADDRNOOTAVAIL.
 1.11 26-Oct-2019  christos - use accept4 instead of paccept for everyone.
- add test for accept preserving non-block
- comment on FreeBSD and Linux behavior.
 1.10 16-Feb-2018  christos branches: 1.10.4;
Use the same variable name for the accepted socket as with the AF_LOCAL test.
Call getpeereid on the accepted socket.
 1.9 16-Feb-2018  christos ensure that getpeereid does not succeed on tcp sockets.
 1.8 16-Feb-2018  christos explain what's going on before we fix it.
 1.7 16-Feb-2018  christos add getpeereid tests for non-unix sockets, returns garbage...
 1.6 28-Aug-2017  christos add tests for 4->6 connections.
 1.5 28-Aug-2017  christos add v6 tests
 1.4 04-Mar-2016  christos make it work on linux, be pickier about errors, and correct variable type.
 1.3 17-Oct-2013  christos branches: 1.3.4; 1.3.8;
CID 1107548: resource leak
 1.2 12-Oct-2013  christos more tests
 1.1 12-Oct-2013  christos new test to check if non-blocking sockets are reset to blocking on the
accepted file descriptor.
 1.3.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.8.1 17-Oct-2013  tls file t_tcp.c was added on branch tls-maxphys on 2014-08-20 00:04:52 +0000
 1.3.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.4.1 17-Oct-2013  yamt file t_tcp.c was added on branch yamt-pagecache on 2014-05-22 11:42:22 +0000
 1.10.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.12.4.1 02-Aug-2025  perseant Sync with HEAD
 1.12.2.1 24-Aug-2024  martin Pull up following revision(s) (requested by rin in ticket #813):

tests/kernel/kqueue/t_empty.c: revision 1.2
tests/net/net/t_tcp.c: revision 1.13

tests: Fix false positives due to race b/w connect(2) and accept(2)

For kernel/kqueue/t_empty and net/net/t_tcp, there were no sync ops
b/w connect(2) and accept(2) for non-blocking socket pair on host
(rump is not used).

As a result, accept(2) can fail immediately with EAGAIN, when
kernel-side routines for connect(2) and accept(2) are processed in
different CPU cores.
1-sec sleep(3) between two syscalls seems to mitigate this problem
as far as I can see, although this should not be a perfect solution...

Thanks ozaki-r@ for discussion.
 1.2 06-Jan-2013  christos branches: 1.2.2; 1.2.6;
fix messages.
 1.1 06-Jan-2013  christos new udp test
 1.2.6.2 25-Feb-2013  tls resync with head
 1.2.6.1 06-Jan-2013  tls file t_udp.c was added on branch tls-maxphys on 2013-02-25 00:30:24 +0000
 1.2.2.2 23-Jan-2013  yamt sync with head
 1.2.2.1 06-Jan-2013  yamt file t_udp.c was added on branch yamt-pagecache on 2013-01-23 00:06:36 +0000
 1.28 27-Mar-2025  riastradh t_unix: Test LOCAL_CONNWAIT.

PR kern/59220: accept(2): null pointer deref
 1.27 27-Mar-2025  riastradh t_unix: Make existing tests more reliable by exiting in child.

Returning into atf in the child is not helpful.

Preparation for adding a test for:

PR kern/59220: accept(2): null pointer deref
 1.26 27-Mar-2025  riastradh t_unix: Sort includes.

No functional change intended.

Preparation for:

PR kern/59220: accept(2): null pointer deref
 1.25 08-Aug-2021  nia branches: 1.25.4;
introduce a SOL_LOCAL for unix-domain socket level socket options
as an alias of the current 0 used for these options, as in FreeBSD.

reviewed by many.
 1.24 28-Aug-2020  riastradh Nix trailing whitespace.
 1.23 28-Aug-2020  christos When running the tests with atf-run the directory we are running in is
drwx------ so when we change to a different user, we can't find the socket
we created.

Make a directory and put the socket in there. Of course now atf can't
record its results as a different user, but that is not fatal.

tc-se:FATAL ERROR: Cannot create results file '/tmp/atf-run.9vOjFd/tcr': \
Permission denied
 1.22 28-Aug-2020  christos Remove unneeded sete{u,g}id pointed out by kre.
Remove dup unlink.
 1.21 27-Aug-2020  christos - when running as root, create the socket under a different uid/gid to verify
that it works properly with different users opening the socket.
- verify that linux works the same for both getpeereid() and fstat()
 1.20 26-Aug-2020  christos Check that fstat returns the correct socket owner
 1.19 06-Jul-2020  christos don't open the socket twice.
 1.18 14-Apr-2019  christos ifix uninialized pid
 1.17 17-Feb-2018  christos branches: 1.17.4;
make it compile again for those who don't have LOCAL_PEERCRED
 1.16 17-Feb-2018  christos Add a test demonstrating thst LOCAL_PEEREID is busted.
 1.15 16-Feb-2018  christos make sure we call getpeername on the accepted socket!
 1.14 16-Feb-2018  christos explain what's going on before we fix it.
 1.13 16-Feb-2018  christos add getpeereid tests for non-unix sockets, returns garbage...
 1.12 16-Feb-2018  christos add a getpeeeid test.
 1.11 13-Nov-2013  christos CID 1107543, 230676, 1107543, 976795, 230676, 976795, 1125885, etc.
 1.10 20-Oct-2013  christos write fail as a proper macro
 1.9 17-Oct-2013  christos CID 1107550: resource leak
 1.8 10-Oct-2013  christos make this work on linux
 1.7 08-Oct-2013  christos Improve tests so that they check the sockaddr's returned by accept(2) and
getsockname(2). Test for accept success after closed client socket.
 1.6 04-Oct-2011  christos branches: 1.6.2; 1.6.8;
Fixed reversed/child parent and check the right variable for failure from
gson@
 1.5 03-Oct-2011  christos Fix the exceed test.
 1.4 30-Sep-2011  christos use ATF_CHECK_MSG instead of err() in atf.
 1.3 28-Sep-2011  christos revert part of previous that was wrong.
 1.2 28-Sep-2011  christos fix error message
 1.1 28-Sep-2011  christos Add a unix socket pathname size limit test.
 1.6.8.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.17.4.1 10-Jun-2019  christos Sync with HEAD
 1.25.4.1 02-Aug-2025  perseant Sync with HEAD
 1.2 17-Feb-2018  christos remove clause 3, 4.
 1.1 16-Feb-2018  christos add getpeereid tests for non-unix sockets, returns garbage...
 1.2 20-Jul-2025  joe l2 only tests

for this test suite, we test to ensure that all frames
are passed by default when no layer 2 rules are set in the config

reviewed by christos@
 1.1 12-Sep-2012  martin branches: 1.1.2; 1.1.6; 1.1.46;
ATF wrapping of the npf tests
 1.1.46.1 02-Aug-2025  perseant Sync with HEAD
 1.1.6.2 12-Sep-2012  martin ATF wrapping of the npf tests
 1.1.6.1 12-Sep-2012  martin file Makefile was added on branch tls-maxphys on 2012-09-12 14:06:32 +0000
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 12-Sep-2012  yamt file Makefile was added on branch yamt-pagecache on 2012-10-30 19:00:06 +0000
 1.9 20-Jul-2025  joe l2 only tests

for this test suite, we test to ensure that all frames
are passed by default when no layer 2 rules are set in the config

reviewed by christos@
 1.8 18-Apr-2025  riastradh t_npf: Print a stack trace with gdb if it dumps core.

Based on net_common.sh.

PR port-sparc/59321: t_npf tests are failing
 1.7 30-Oct-2024  riastradh npfctl(8): Fix compiling multiword comparisons, i.e., IPv6 addrs.

PR bin/55403: npfctl miscompiles IPv6 rules
 1.6 30-Oct-2024  riastradh npftest: Fix newly added test.

- Adapt new test to actually exercise new rules.
- Mark the right test xfail.

PR bin/55403: npfctl miscompiles IPv6 rules
 1.5 29-Oct-2024  riastradh npftest: Add a test to match groups of IPv6 addresses.

The npf_rule test group is now an xfail. (npftest doesn't have a way
to mark individual cases in a test group as xfail, so this will have
to do for now.)

PR bin/55403: npfctl miscompiles IPv6 rules
 1.4 01-Jun-2020  martin branches: 1.4.8;
Adjust to "npfctl debug" command line changes, from rmind@.
 1.3 03-Aug-2017  ozaki-r branches: 1.3.6;
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.2 18-Sep-2012  martin branches: 1.2.2; 1.2.4; 1.2.8; 1.2.28;
Try to make this test gracefully fail when npftest is not available
 1.1 12-Sep-2012  martin ATF wrapping of the npf tests
 1.2.28.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.2.8.2 18-Sep-2012  martin Try to make this test gracefully fail when npftest is not available
 1.2.8.1 18-Sep-2012  martin file t_npf.sh was added on branch tls-maxphys on 2012-09-18 08:28:16 +0000
 1.2.4.2 18-Sep-2012  martin Try to make this test gracefully fail when npftest is not available
 1.2.4.1 18-Sep-2012  martin file t_npf.sh was added on branch tls-maxphys on 2012-09-18 08:28:16 +0000
 1.2.2.2 30-Oct-2012  yamt sync with head
 1.2.2.1 18-Sep-2012  yamt file t_npf.sh was added on branch yamt-pagecache on 2012-10-30 19:00:07 +0000
 1.3.6.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.4.8.1 02-Aug-2025  perseant Sync with HEAD
 1.6 20-Sep-2017  ozaki-r Add tests of rtcache invalidation
 1.5 24-Nov-2016  ozaki-r branches: 1.5.6;
Move route check functions to net_common.sh
 1.4 21-Apr-2016  ozaki-r branches: 1.4.2;
Add tests of route flags using IPv6 addresses
 1.3 29-Jan-2016  ozaki-r Add tests for a gateway not on the local subnet

The tests are derived from the example at
http://www.netbsd.org/docs/network/#nonsubnetgateway ,
which has come up in PR 50717.
 1.2 18-May-2015  ozaki-r Add tests for route flags
 1.1 08-Feb-2011  pooka branches: 1.1.2;
Time to start adding tests for the routing code to make that part
of the kernel more approachable.

Begin the task with an xfail test for PR kern/40455.
 1.1.2.2 08-Feb-2011  bouyer Sync with HEAD
 1.1.2.1 08-Feb-2011  bouyer file Makefile was added on branch bouyer-quota2 on 2011-02-08 19:01:37 +0000
 1.4.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.5.6.1 24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #305):
distrib/sets/lists/tests/mi: revision 1.762
sys/net/route.c: revision 1.198-1.201
sys/net/route.h: revision 1.114
sys/netatalk/at_proto.c: revision 1.22
sys/netinet/in_proto.c: revision 1.124
sys/netinet6/in6_proto.c: revision 1.118
sys/netmpls/mpls_proto.c: revision 1.31
sys/netnatm/natm_proto.c: revision 1.18
sys/rump/net/lib/libsockin/sockin.c: revision 1.65
sys/sys/domain.h: revision 1.33
tests/net/route/Makefile: revision 1.6
tests/net/route/t_rtcache.sh: revision 1.1
Add tests of rtcache invalidation
Remove unnecessary NULL check of rt_ifp
It's always non-NULL.
Invalidate rtcache based on a global generation counter
The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.
One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.
This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
Remove the global lock for rtcache
Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
Synchronize on rtcache_generation with rtlock
It's racy if NET_MPSAFE is enabled.
Pointed out by joerg@
 1.14 13-May-2019  bad Get rid of all the -lrumpdev and -lrumpvfs that are no longer needed
after moving rump's mainbus from rumpdev to rumpkern.

Produces the same atf-run results as before.
 1.13 18-Apr-2019  ozaki-r tests: dump kernel stats on cleanup
 1.12 18-Dec-2017  ozaki-r branches: 1.12.4;
Adjust outputs of route's flags to include a numeric output
 1.11 24-Mar-2017  ozaki-r Fix typo
 1.10 22-Mar-2017  ozaki-r Add some tests to change flags of routes
 1.9 07-Nov-2016  ozaki-r branches: 1.9.2;
Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.8 10-Aug-2016  roy Add -lrumpdev so that tests work again.
 1.7 21-Jul-2016  ozaki-r Add some tests for route change
 1.6 21-Jul-2016  ozaki-r Remove extra grep
 1.5 21-Jul-2016  ozaki-r Make the test name self-descriptive
 1.4 19-Feb-2013  joerg branches: 1.4.12;
Check for RUMP programs before using them.
 1.3 14-May-2011  jmmv branches: 1.3.4; 1.3.10;
Instead of doing 'atf_check ... sh -c foo', just do 'atf_check ... -x foo'.
 1.2 10-Feb-2011  kefren Problem was fixed, don't expect to fail anymore
 1.1 08-Feb-2011  pooka branches: 1.1.2;
Time to start adding tests for the routing code to make that part
of the kernel more approachable.

Begin the task with an xfail test for PR kern/40455.
 1.1.2.3 17-Feb-2011  bouyer Sync with HEAD
 1.1.2.2 08-Feb-2011  bouyer Sync with HEAD
 1.1.2.1 08-Feb-2011  bouyer file t_change.sh was added on branch bouyer-quota2 on 2011-02-08 19:01:37 +0000
 1.3.10.1 25-Feb-2013  tls resync with head
 1.3.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.12.3 26-Apr-2017  pgoyette Sync with HEAD
 1.4.12.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.4.12.1 26-Jul-2016  pgoyette Sync with HEAD
 1.9.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.12.4.1 10-Jun-2019  christos Sync with HEAD
 1.20 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.19 30-Jun-2017  ozaki-r Check if ARP/NDP entries are purged when a related route is deleted
 1.18 28-Jun-2017  ozaki-r Restore ARP/NDP entries to route show and netstat -r

Requested by dyoung@ some time ago
 1.17 27-Jun-2017  ozaki-r Fix wrong comment
 1.16 27-Jun-2017  ozaki-r Check existence of ARP/NDP entries

Checking ARP/NDP entries is valid rather than checking routes.
 1.15 21-Dec-2016  ozaki-r branches: 1.15.6;
Add ifconfig -w to improve test stability
 1.14 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.13 24-Nov-2016  ozaki-r Move route check functions to net_common.sh
 1.12 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.11 10-Aug-2016  roy Add -lrumpdev so that tests work again.
 1.10 08-Jul-2016  ozaki-r branches: 1.10.2;
Fix test names
 1.9 21-Jun-2016  ozaki-r Make a bunch of test names self-descriptive
 1.8 23-Apr-2016  ozaki-r Return 0 for $DEBUG=false case
 1.7 23-Apr-2016  ozaki-r Add more tests of RTF_REJECT
 1.6 22-Apr-2016  ozaki-r Add more tests of RTF_REJECT
 1.5 21-Apr-2016  ozaki-r Fix tests for blackhole routes

The gateway of a blackhole route must be a loopback interface.
 1.4 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.3 01-Jun-2015  ozaki-r Improve stability of route_flags_xresolve tests

Insert delays to give route monitor a chance to complete its work.
 1.2 20-May-2015  ozaki-r Add tests for XRESOLVE flag
 1.1 18-May-2015  ozaki-r Add tests for route flags
 1.10.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.15.6.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.15.6.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.16 03-Aug-2017  ozaki-r Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
 1.15 30-Jun-2017  ozaki-r Check if ARP/NDP entries are purged when a related route is deleted
 1.14 28-Jun-2017  ozaki-r Restore ARP/NDP entries to route show and netstat -r

Requested by dyoung@ some time ago
 1.13 27-Jun-2017  ozaki-r Check existence of ARP/NDP entries

Checking ARP/NDP entries is valid rather than checking routes.
 1.12 21-Dec-2016  ozaki-r branches: 1.12.6;
Add ifconfig -w to improve test stability
 1.11 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.10 24-Nov-2016  ozaki-r Move route check functions to net_common.sh
 1.9 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.8 16-Aug-2016  roy Fix test_lo6 because ::1 now has RTF_LOCAL assigned to it.
 1.7 10-Aug-2016  roy Add -lrumpdev so that tests work again.
 1.6 08-Jul-2016  ozaki-r branches: 1.6.2;
Fix test names
 1.5 23-Apr-2016  ozaki-r Return 0 for $DEBUG=false case
 1.4 23-Apr-2016  ozaki-r Add more tests of RTF_REJECT
 1.3 22-Apr-2016  ozaki-r Add more tests of RTF_REJECT
 1.2 21-Apr-2016  ozaki-r Fix tests for blackhole routes

The gateway of a blackhole route must be a loopback interface.
 1.1 21-Apr-2016  ozaki-r Add tests of route flags using IPv6 addresses
 1.6.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.12.6.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.12.6.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.15 20-Sep-2022  knakahara tests: add tests for automatic route deletions on an address removal
 1.14 18-Dec-2017  ozaki-r Adjust outputs of route's flags to include a numeric output
 1.13 28-Jun-2017  ozaki-r Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes

They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
 1.12 24-Mar-2017  ozaki-r branches: 1.12.4;
Add test cases for PR kern/52077

From s-yamaguchi@IIJ
 1.11 21-Mar-2017  ozaki-r Add a test case for PR kern/52083
 1.10 21-Dec-2016  ozaki-r branches: 1.10.2;
Add ifconfig -w to improve test stability
 1.9 25-Nov-2016  ozaki-r Share rump_server start/stop and interface creation/destruction functions

The common functions store socks of rump_servers, interfaces of rump_servers
and buses that intefaces connect and allow to destroy them with common
functions without specifying which socks, interfaces and buses we should
destroy.

This change reduces lots of similar setup/cleanup codes.
 1.8 07-Nov-2016  ozaki-r Accept DEBUG environment variable

By doing so, we can easily turn DEBUG on/off without modifying
the ATF scripts.
 1.7 10-Aug-2016  roy Add -lrumpdev so that tests work again.
 1.6 21-Jun-2016  ozaki-r branches: 1.6.2;
Tweak route get outputs to make tests work

"expire" value of route get output is unexpectedly a negative value
on rump kernel for some reasons and the tests almost always fail
on babylon5. So just ignore it to make tests work for now. Should
fix it in the future.
 1.5 21-Jun-2016  ozaki-r Make a bunch of test names self-descriptive
 1.4 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.3 28-Mar-2016  ozaki-r Make outputs informative on failure
 1.2 28-Mar-2016  ozaki-r Add tests for "route get"
 1.1 29-Jan-2016  ozaki-r Add tests for a gateway not on the local subnet

The tests are derived from the example at
http://www.netbsd.org/docs/network/#nonsubnetgateway ,
which has come up in PR 50717.
 1.6.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.6.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.10.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.12.4.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.1 20-Sep-2017  ozaki-r branches: 1.1.2;
Add tests of rtcache invalidation
 1.1.2.2 24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #305):
distrib/sets/lists/tests/mi: revision 1.762
sys/net/route.c: revision 1.198-1.201
sys/net/route.h: revision 1.114
sys/netatalk/at_proto.c: revision 1.22
sys/netinet/in_proto.c: revision 1.124
sys/netinet6/in6_proto.c: revision 1.118
sys/netmpls/mpls_proto.c: revision 1.31
sys/netnatm/natm_proto.c: revision 1.18
sys/rump/net/lib/libsockin/sockin.c: revision 1.65
sys/sys/domain.h: revision 1.33
tests/net/route/Makefile: revision 1.6
tests/net/route/t_rtcache.sh: revision 1.1
Add tests of rtcache invalidation
Remove unnecessary NULL check of rt_ifp
It's always non-NULL.
Invalidate rtcache based on a global generation counter
The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.
One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.
This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
Remove the global lock for rtcache
Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
Synchronize on rtcache_generation with rtlock
It's racy if NET_MPSAFE is enabled.
Pointed out by joerg@
 1.1.2.1 20-Sep-2017  snj file t_rtcache.sh was added on branch netbsd-8 on 2017-10-24 08:55:56 +0000
 1.6 13-Jul-2010  jmmv Get rid of static Atffiles and let bsd.test.mk generate them on the fly.
 1.5 12-Jun-2010  pooka tp-glob t_* instead of * (so that atf-run works in the source directory)
 1.4 30-Dec-2007  jmmv branches: 1.4.2;
Re-add the NetBSD CVS Id tag to the header. It just had to be quoted to
be accepted by the parser; i.e. no bug in the code :-)

Note to self: do not try to "fix" stuff the last minute before going to
sleep.
 1.3 29-Dec-2007  jmmv Back out the change to introduce the X-NetBSD-Id header entry. For some
reason the parser does not accept its contents... You know, one should
always test even trivial changes!

Will review this in more depth tomorrow to find the real root cause of the
problem and rule out a fix for ATF.
 1.2 26-Dec-2007  jmmv Add the NetBSD Id tag to the Atffiles. Issue raised by pooka@ a while ago.
 1.1 23-Dec-2007  jmmv Add regression tests for low-port allocation in connect and listen, which
was broken and fixed recently in:
http://mail-index.netbsd.org/source-changes/2007/12/16/0011.html

Test-case code provided by elad@.
 1.4.2.2 09-Jan-2008  matt sync with HEAD
 1.4.2.1 30-Dec-2007  matt file Atffile was added on branch matt-armv6 on 2008-01-09 01:59:29 +0000
 1.5 05-Nov-2011  jruoho Move connect(2), listen(2) and socketpair(2) tests to the right place.
 1.4 04-Nov-2011  christos socketpair test.
 1.3 24-Sep-2011  christos branches: 1.3.2;
Add an rfc6056 test.
 1.2 01-May-2008  jmmv Convert NetBSD-specific tests that were previously written in C++ to C now
that ATF provides a C-only binding (comes with 0.5).
 1.1 23-Dec-2007  jmmv branches: 1.1.2; 1.1.6;
Add regression tests for low-port allocation in connect and listen, which
was broken and fixed recently in:
http://mail-index.netbsd.org/source-changes/2007/12/16/0011.html

Test-case code provided by elad@.
 1.1.6.1 18-May-2008  yamt sync with head.
 1.1.2.2 09-Jan-2008  matt sync with HEAD
 1.1.2.1 23-Dec-2007  matt file Makefile was added on branch matt-armv6 on 2008-01-09 01:59:29 +0000
 1.3.2.1 10-Nov-2011  yamt sync with head
 1.5 05-Nov-2011  jruoho Move connect(2), listen(2) and socketpair(2) tests to the right place.
 1.4 03-Nov-2010  christos branches: 1.4.6;
add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.3 12-Jun-2010  wiz Fix typo in comment.
 1.2 12-Jun-2010  pooka Connect to localhost instead of www.netbsd.org to a) make this work
in non-routed setups b) avoid the dubious security implications.
Also, to make the test actually do what it claims to do, call
getsockname() and verify we got a low port.
 1.1 01-May-2008  jmmv branches: 1.1.4;
Convert NetBSD-specific tests that were previously written in C++ to C now
that ATF provides a C-only binding (comes with 0.5).
 1.1.4.2 18-May-2008  yamt sync with head.
 1.1.4.1 01-May-2008  yamt file t_connect.c was added on branch yamt-pf42 on 2008-05-18 12:36:01 +0000
 1.4.6.1 10-Nov-2011  yamt sync with head
 1.4 01-May-2008  jmmv Convert NetBSD-specific tests that were previously written in C++ to C now
that ATF provides a C-only binding (comes with 0.5).
 1.3 30-Apr-2008  martin Convert TNF licenses to new 2 clause variant
 1.2 04-Jan-2008  jmmv branches: 1.2.2; 1.2.6;
Fix headers: add NetBSD CVS id tag and drop ATF title.
 1.1 23-Dec-2007  jmmv Add regression tests for low-port allocation in connect and listen, which
was broken and fixed recently in:
http://mail-index.netbsd.org/source-changes/2007/12/16/0011.html

Test-case code provided by elad@.
 1.2.6.1 17-Jun-2008  yamt fix merge botches
 1.2.2.2 09-Jan-2008  matt sync with HEAD
 1.2.2.1 04-Jan-2008  matt file t_connect.cpp was added on branch matt-armv6 on 2008-01-09 01:59:29 +0000
 1.3 05-Nov-2011  jruoho Move connect(2), listen(2) and socketpair(2) tests to the right place.
 1.2 03-Nov-2010  christos branches: 1.2.6;
add Makefile.inc everywhere so that we can set WARNS=4 by default. Amazing
how many bugs this found :-)
 1.1 01-May-2008  jmmv branches: 1.1.4;
Convert NetBSD-specific tests that were previously written in C++ to C now
that ATF provides a C-only binding (comes with 0.5).
 1.1.4.2 18-May-2008  yamt sync with head.
 1.1.4.1 01-May-2008  yamt file t_listen.c was added on branch yamt-pf42 on 2008-05-18 12:36:01 +0000
 1.2.6.1 10-Nov-2011  yamt sync with head
 1.4 01-May-2008  jmmv Convert NetBSD-specific tests that were previously written in C++ to C now
that ATF provides a C-only binding (comes with 0.5).
 1.3 30-Apr-2008  martin Convert TNF licenses to new 2 clause variant
 1.2 04-Jan-2008  jmmv branches: 1.2.2; 1.2.6;
Fix headers: add NetBSD CVS id tag and drop ATF title.
 1.1 23-Dec-2007  jmmv Add regression tests for low-port allocation in connect and listen, which
was broken and fixed recently in:
http://mail-index.netbsd.org/source-changes/2007/12/16/0011.html

Test-case code provided by elad@.
 1.2.6.1 17-Jun-2008  yamt fix merge botches
 1.2.2.2 09-Jan-2008  matt sync with HEAD
 1.2.2.1 04-Jan-2008  matt file t_listen.cpp was added on branch matt-armv6 on 2008-01-09 01:59:30 +0000
 1.3 22-Jun-2012  christos PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.2 05-Nov-2011  jruoho Add missing copyright ((c) @christos).
 1.1 24-Sep-2011  christos branches: 1.1.2;
Add an rfc6056 test.
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 10-Nov-2011  yamt sync with head
 1.2 05-Nov-2011  jruoho Move connect(2), listen(2) and socketpair(2) tests to the right place.
 1.1 04-Nov-2011  christos socketpair test.
 1.3 17-Nov-2022  ozaki-r tests: build and install added test files
 1.2 04-Nov-2022  ozaki-r tests: add tests for invalid extra operations on a shutdown socket

The tests cover some error paths that normally happen.
 1.1 02-Nov-2022  ozaki-r tests: add tests for TCP with nc
 1.2 17-Nov-2022  ozaki-r tests: make t_tcp_connect_port.c run on rump kernel
 1.1 17-Nov-2022  ozaki-r tests: import tcp_connect_port_test.c from FreeBSD as t_tcp_connect_port.c

As of:
commit 36c52a52eecf1ed0232f9e138564009a85de76c2
Author: Jonathan T. Looney <jtl@FreeBSD.org>
Date: Sat Nov 14 15:44:28 2020 +0000

Add a regression test for the port-selection behavior fixed in r367680.
 1.1 02-Nov-2022  ozaki-r tests: add tests for TCP with nc
 1.1 04-Nov-2022  ozaki-r tests: add tests for invalid extra operations on a shutdown socket

The tests cover some error paths that normally happen.
 1.1 04-Nov-2022  ozaki-r tests: add tests for invalid extra operations on a shutdown socket

The tests cover some error paths that normally happen.

RSS XML Feed