Home | History | Annotate | only in /src/sys/netipsec
History log of /src/sys/netipsec
RevisionDateAuthorComments
 1.6 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.5 06-Jan-2012  drochner branches: 1.5.40;
more IPSEC header cleanup: don't install unneeded headers to userland,
and remove some differences berween KAME and FAST_IPSEC
 1.4 04-Jan-2012  drochner include <netipsec/ipsec.h> rather than <netinet6/ipsec.h> from userland
where possible, for consistency and compatibility to FreeBSD
(exception: KAME specific statistics gathering in netstat(1) and systat(1))
 1.3 04-Jan-2012  drochner -consistently use "char *" for the compiled policy buffer in the
ipsec_*_policy() functions, as it was documented and used by clients
-remove "ipsec_policy_t" which was undocumented and only present
in the KAME version of the ipsec.h header
-misc cleanup of historical artefacts, and to remove unnecessary
differences between KAME ans FAST_IPSEC
 1.2 11-Dec-2005  christos branches: 1.2.110; 1.2.114;
merge ktrace-lwp.
 1.1 07-May-2004  jonathan branches: 1.1.2; 1.1.4;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 07-May-2004  skrll file Makefile was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.1.2.2 10-May-2004  tron Pull up revision 1.1 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1.2.1 07-May-2004  tron file Makefile was added on branch netbsd-2-0 on 2004-05-10 15:00:38 +0000
 1.2.114.1 18-Feb-2012  mrg merge to -current.
 1.2.110.1 17-Apr-2012  yamt sync with head
 1.5.40.1 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.3 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.2 10-Dec-2005  elad branches: 1.2.162;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 13-Aug-2003  jonathan branches: 1.1.4; 1.1.18;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.18.1 21-Jun-2006  yamt sync with head.
 1.1.4.5 11-Dec-2005  christos Sync with head.
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 13-Aug-2003  skrll file ah.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.2.162.1 22-Apr-2018  pgoyette Sync with HEAD
 1.7 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.6 16-Feb-2018  maxv branches: 1.6.2;
Remove unused.
 1.5 13-Apr-2017  christos Redo the statistics through an indirection array and put the definitions
of the arrays in pfkeyv2.h so that they are next to the index definitions.
Remove "bogus" comment about compressing the statistics which is now fixed.
 1.4 23-Apr-2008  thorpej branches: 1.4.46; 1.4.66; 1.4.70; 1.4.74;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.3 10-Dec-2005  elad branches: 1.3.70; 1.3.72;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 07-May-2004  jonathan branches: 1.2.2; 1.2.14;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1 13-Aug-2003  jonathan branches: 1.1.2;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.2.1 10-May-2004  tron Pull up revision 1.2 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.2.14.1 21-Jun-2006  yamt sync with head.
 1.2.2.5 11-Dec-2005  christos Sync with head.
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.2.2.1 07-May-2004  skrll file ah_var.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.3.72.1 18-May-2008  yamt sync with head.
 1.3.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.4.74.1 21-Apr-2017  bouyer Sync with HEAD
 1.4.70.1 26-Apr-2017  pgoyette Sync with HEAD
 1.4.66.1 28-Aug-2017  skrll Sync with HEAD
 1.4.46.1 03-Dec-2017  jdolecek update from HEAD
 1.6.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.4 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.3 10-Dec-2005  elad branches: 1.3.162;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 26-Feb-2005  perry branches: 1.2.4;
nuke trailing whitespace
 1.1 13-Aug-2003  jonathan branches: 1.1.4; 1.1.10; 1.1.12;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.10.1 29-Apr-2005  kent sync with -current
 1.1.4.6 11-Dec-2005  christos Sync with head.
 1.1.4.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 13-Aug-2003  skrll file esp.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.2.4.1 21-Jun-2006  yamt sync with head.
 1.3.162.1 22-Apr-2018  pgoyette Sync with HEAD
 1.6 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.5 13-Apr-2017  christos branches: 1.5.10;
Redo the statistics through an indirection array and put the definitions
of the arrays in pfkeyv2.h so that they are next to the index definitions.
Remove "bogus" comment about compressing the statistics which is now fixed.
 1.4 23-Apr-2008  thorpej branches: 1.4.46; 1.4.66; 1.4.70; 1.4.74;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.3 10-Dec-2005  elad branches: 1.3.70; 1.3.72;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 07-May-2004  jonathan branches: 1.2.2; 1.2.14;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1 13-Aug-2003  jonathan branches: 1.1.2;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.2.1 10-May-2004  tron Pull up revision 1.2 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.2.14.1 21-Jun-2006  yamt sync with head.
 1.2.2.5 11-Dec-2005  christos Sync with head.
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.2.2.1 07-May-2004  skrll file esp_var.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.3.72.1 18-May-2008  yamt sync with head.
 1.3.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.4.74.1 21-Apr-2017  bouyer Sync with HEAD
 1.4.70.1 26-Apr-2017  pgoyette Sync with HEAD
 1.4.66.1 28-Aug-2017  skrll Sync with HEAD
 1.4.46.1 03-Dec-2017  jdolecek update from HEAD
 1.5.10.1 22-Apr-2018  pgoyette Sync with HEAD
 1.4 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.3 12-Sep-2003  itojun no need for netipsec/key*, they are almost identical to netkey/key*
 1.2 06-Aug-2003  jonathan Move the preprocessor/config feature-test macro (FAST_IPSEC) into opt_ipsec.h,
to simplify changes elsehere.

Add dependency on new file netipec/ipsec_netbsd.c, for some NetBSD-specific
required functionality (e.g., differences in ctl-input keydb handling).
 1.1 25-Jul-2003  jonathan Commit initial NetBSD port of the OpenCrypto Framework (OCF). This
code is derived from Sam Leffler's FreeBSD port of OCF, which is in
turn a port of Angelos Keromytis's OpenBSD work.
Credit to Sam and Angelos, any blame for the NetBSD port to me.
 1.15 30-Jun-2020  riastradh Rename enc_xform_rijndael128 -> enc_xform_aes.

Update netipsec dependency.
 1.14 22-Apr-2020  rin Make crypto/rijindael optional again as cprng_strong does no longer
depend on it. Dependency is explicitly declared in files.foo if a
component requires it.
 1.13 10-Jan-2018  knakahara branches: 1.13.14;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.12 05-Jun-2013  christos branches: 1.12.26;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.11 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.10 22-Mar-2012  drochner branches: 1.10.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.9 09-Jan-2012  drochner Make FAST_IPSEC the default IPSEC implementation which is built
into the kernel if the "IPSEC" kernel option is given.
The old implementation is still available as KAME_IPSEC.
Do some minimal manpage adjustment -- kame_ipsec(4) is a copy
of the old ipsec(4) and the latter is now a copy of fast_ipsec(4).
 1.8 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.7 16-Nov-2007  christos branches: 1.7.52; 1.7.56;
defflag IPSEC_DEBUG
 1.6 11-Dec-2005  christos branches: 1.6.44; 1.6.46; 1.6.50; 1.6.52;
merge ktrace-lwp.
 1.5 26-Apr-2004  itojun branches: 1.5.2; 1.5.14;
xform_tcp.c is needed only with FAST_IPSEC
 1.4 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.3 31-Dec-2003  jonathan Split opencrypto configuration into an attribute, usable by inkernel
clients, and a pseudo-device for userspace access.

The attribute is named `opencrypto'. The pseudo-device is renamed to
"crypto", which has a dependency on "opencrypto". The sys/conf/majors
entry and pseudo-device attach entrypoint are updated to match the
new pseudo-device name.

Fast IPsec (sys/netipsec/files.ipsec) now lists a dependency on the
"opencrypto" attribute. Drivers for crypto accelerators (ubsec,
hifn775x) also pull in opencrypto, as providers of opencrypto transforms.
 1.2 20-Sep-2003  itojun separate netkey/key* and netipsec/key*
 1.1 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.5.14.1 07-Dec-2007  yamt sync with head
 1.5.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.2 03-Aug-2004  skrll Sync with HEAD
 1.5.2.1 26-Apr-2004  skrll file files.netipsec was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.6.52.1 19-Nov-2007  mjf Sync with HEAD.
 1.6.50.1 18-Nov-2007  bouyer Sync with HEAD
 1.6.46.1 09-Jan-2008  matt sync with HEAD
 1.6.44.1 21-Nov-2007  joerg Sync with HEAD.
 1.7.56.2 05-Apr-2012  mrg sync to latest -current.
 1.7.56.1 18-Feb-2012  mrg merge to -current.
 1.7.52.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.52.1 17-Apr-2012  yamt sync with head
 1.10.2.1 23-Jun-2013  tls resync from head
 1.12.26.1 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.13.14.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.3 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.2 10-Dec-2005  elad branches: 1.2.162;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 13-Aug-2003  jonathan branches: 1.1.4; 1.1.18;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.18.1 21-Jun-2006  yamt sync with head.
 1.1.4.5 11-Dec-2005  christos Sync with head.
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 13-Aug-2003  skrll file ipcomp.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.2.162.1 22-Apr-2018  pgoyette Sync with HEAD
 1.8 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.7 13-Apr-2017  christos branches: 1.7.10;
Redo the statistics through an indirection array and put the definitions
of the arrays in pfkeyv2.h so that they are next to the index definitions.
Remove "bogus" comment about compressing the statistics which is now fixed.
 1.6 23-Apr-2008  thorpej branches: 1.6.46; 1.6.66; 1.6.70; 1.6.74;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.5 29-Dec-2007  degroote branches: 1.5.6; 1.5.8;
Add some statistics for case where compression is not useful
(when len(compressed packet) > len(initial packet))
 1.4 10-Feb-2007  degroote branches: 1.4.20; 1.4.26; 1.4.32;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.3 10-Dec-2005  elad branches: 1.3.24; 1.3.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 07-May-2004  jonathan branches: 1.2.2; 1.2.14;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1 13-Aug-2003  jonathan branches: 1.1.2;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.2.1 10-May-2004  tron Pull up revision 1.2 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.2.14.3 21-Jan-2008  yamt sync with head
 1.2.14.2 26-Feb-2007  yamt sync with head.
 1.2.14.1 21-Jun-2006  yamt sync with head.
 1.2.2.5 11-Dec-2005  christos Sync with head.
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.2.2.1 07-May-2004  skrll file ipcomp_var.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.3.26.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.3.24.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.4.32.1 02-Jan-2008  bouyer Sync with HEAD
 1.4.26.1 18-Feb-2008  mjf Sync with HEAD.
 1.4.20.1 09-Jan-2008  matt sync with HEAD
 1.5.8.1 18-May-2008  yamt sync with head.
 1.5.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.6.74.1 21-Apr-2017  bouyer Sync with HEAD
 1.6.70.1 26-Apr-2017  pgoyette Sync with HEAD
 1.6.66.1 28-Aug-2017  skrll Sync with HEAD
 1.6.46.1 03-Dec-2017  jdolecek update from HEAD
 1.7.10.1 22-Apr-2018  pgoyette Sync with HEAD
 1.6 22-Apr-2018  maxv Rename ipip_allow->ipip_spoofcheck, and add net.inet.ipsec.ipip_spoofcheck.
Makes it simpler, and also fixes PR/39919.
 1.5 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.4 23-Apr-2008  thorpej branches: 1.4.88;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.3 10-Dec-2005  elad branches: 1.3.70; 1.3.72;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 07-May-2004  jonathan branches: 1.2.2; 1.2.14;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1 13-Aug-2003  jonathan branches: 1.1.2;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.2.1 10-May-2004  tron Pull up revision 1.2 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.2.14.1 21-Jun-2006  yamt sync with head.
 1.2.2.5 11-Dec-2005  christos Sync with head.
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.2.2.1 07-May-2004  skrll file ipip_var.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.3.72.1 18-May-2008  yamt sync with head.
 1.3.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.4.88.2 02-May-2018  pgoyette Synch with HEAD
 1.4.88.1 22-Apr-2018  pgoyette Sync with HEAD
 1.179 13-May-2024  msaitoh s/priviliged/privileged/
 1.178 27-Jan-2023  ozaki-r ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.177 08-Dec-2022  knakahara branches: 1.177.2;
Fix: sp->lastused should be updated by time_uptime, and refactor a little.
 1.176 09-Nov-2022  knakahara Fix IPv4 security policy with port number does not work for forwarding packets.
 1.175 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.174 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.173 08-Dec-2021  andvar s/speficication/specification/
 1.172 28-Aug-2020  ozaki-r ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.
 1.171 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.170 07-Aug-2019  knakahara ipsec_getpolicybysock() should also call key_havesp() like ipsec_getpolicybyaddr().

That can reduce KEYDEBUG messages.
 1.169 09-Jul-2019  maxv Fix uninitialized variable: in ipsec_checkpcbcache(), spidx.dir is not
initialized, and the padding of the spidx structure is not initialized
either. This causes the memcmp() to wrongfully fail.

Change ipsec_setspidx() to always initialize spdix.dir and zero out the
padding.

ok ozaki-r@
 1.168 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.167 22-Nov-2018  knakahara Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ.

Add ATF later.
 1.166 27-Oct-2018  maxv Localify one function, and switch to C99 types while here.
 1.165 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.164 14-May-2018  maxv branches: 1.164.2;
Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
 1.163 10-May-2018  maxv Replace dumb code by M_VERIFY_PACKET. In fact, perhaps we should not even
call M_VERIFY_PACKET here, there is no particular reason for this place to
be more wrong than the rest.
 1.162 10-May-2018  maxv Rename ipsec4_forward -> ipsec_mtu, and switch to void.
 1.161 29-Apr-2018  maxv Remove unused and misleading argument from ipsec_set_policy.
 1.160 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.159 28-Apr-2018  maxv Stop using a macro, rename the function to ipsec_init_pcbpolicy directly.
 1.158 28-Apr-2018  maxv Style and remove unused stuff.
 1.157 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.156 18-Apr-2018  maxv Remove dead code.

ok ozaki-r@
 1.155 17-Apr-2018  maxv Add XXX. If this code really does something, it should use MCHTYPE.
 1.154 17-Apr-2018  maxv Style, add XXX (about the mtu that goes negative), and remove #ifdef inet.
 1.153 03-Apr-2018  maxv Remove ipsec_copy_policy and ipsec_copy_pcbpolicy. No functional change,
since we used only ipsec_copy_pcbpolicy, and it was a no-op.

Originally we were using ipsec_copy_policy to optimize the IPsec-PCB
cache: when an ACK was received in response to a SYN, we used to copy the
SP cached in the SYN's PCB into the ACK's PCB, so that
ipsec_getpolicybysock could use the cached SP instead of requerying it.

Then we switched to ipsec_copy_pcbpolicy which has always been a no-op. As
a result the SP cached in the SYN was/is not copied in the ACK, and the
first call to ipsec_getpolicybysock had to query the SP and cache it
itself. It's not totally clear to me why this change was made.

But it has been this way for years, and after a conversation with Ryota
Ozaki it turns out the optimization is not valid anymore due to
MP-ification, so it won't be re-enabled.

ok ozaki-r@
 1.152 31-Mar-2018  maxv typo in comments
 1.151 03-Mar-2018  maxv branches: 1.151.2;
Reduce the diff between ipsec4_output and ipsec6_check_policy. While here
style.
 1.150 03-Mar-2018  maxv Dedup.
 1.149 28-Feb-2018  maxv add missing static
 1.148 28-Feb-2018  maxv Dedup: merge ipsec4_setspidx_inpcb and ipsec6_setspidx_in6pcb.
 1.147 28-Feb-2018  maxv ipsec6_setspidx_in6pcb: call ipsec_setspidx() only once, just like the
IPv4 code. While here put the correct variable in sizeof.

ok ozaki-r@
 1.146 27-Feb-2018  maxv Dedup: merge ipsec4_set_policy and ipsec6_set_policy. The content of the
original ipsec_set_policy function is inlined into the new one.
 1.145 27-Feb-2018  maxv Remove duplicate checks, and no need to initialize 'newsp' in
ipsec_set_policy.
 1.144 27-Feb-2018  maxv Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.143 27-Feb-2018  maxv Use inpcb_hdr to reduce the diff between

ipsec4_set_policy and ipsec6_set_policy
ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

No real functional change.
 1.142 27-Feb-2018  maxv Optimize: use ipsec_sp_hdrsiz instead of ipsec_hdrsiz, not to re-query
the SP.

ok ozaki-r@
 1.141 26-Feb-2018  maxv Dedup: call ipsec_in_reject directly. IPSEC_STAT_IN_POLVIO also gets
increased now.
 1.140 26-Feb-2018  maxv Reduce the diff between ipsec6_input and ipsec4_input.
 1.139 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.138 26-Feb-2018  maxv Dedup: merge ipsec4_hdrsiz and ipsec6_hdrsiz into ipsec_hdrsiz.

ok ozaki-r@
 1.137 26-Feb-2018  maxv Dedup: merge ipsec4_checkpolicy and ipsec6_checkpolicy into
ipsec_checkpolicy.

ok ozaki-r@
 1.136 26-Feb-2018  maxv Fix nonsensical checks, neither in6p nor request is allowed to be NULL,
and the former is already dereferenced in a kassert. This code should be
the same as ipsec4_set_policy.
 1.135 26-Feb-2018  maxv Merge some minor (mostly stylistic) changes from last week.
 1.134 21-Feb-2018  maxv Fix ipsec4_get_ulp(). We should do "goto done" instead of "return",
otherwise the port fields of spidx are uninitialized.

ok mlelstv@
 1.133 21-Feb-2018  maxv Use inpcb_hdr to reduce the diff between:

ipsec4_hdrsiz and ipsec6_hdrsiz
ipsec4_in_reject and ipsec6_in_reject
ipsec4_checkpolicy and ipsec4_checkpolicy

The members of these couples are now identical, and could be merged,
giving only three functions instead of six...
 1.132 21-Feb-2018  maxv Rename:

ipsec_in_reject -> ipsec_sp_reject
ipsec_hdrsiz -> ipsec_sp_hdrsiz

localify the former, and do some cleanup while here.
 1.131 16-Feb-2018  maxv Style, remove unused and misleading macros and comments, localify, and
reduce the diff between similar functions. No functional change.
 1.130 16-Feb-2018  maxv Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.129 16-Feb-2018  maxv Style a bit, no functional change.
 1.128 16-Feb-2018  maxv Remove some more FreeBSD sysctl declarations that already have NetBSD
counterparts. Discussed with ozaki-r@.
 1.127 16-Feb-2018  maxv Remove ipsec_replay and ipsec_integrity from this place, they are already
declared as sysctls. Discussed with ozaki-r@.
 1.126 16-Feb-2018  maxv Remove ip4_esp_randpad and ip6_esp_randpad, unused. Discussed with
ozaki-r@.
 1.125 08-Feb-2018  maxv Remove unused net_osdep.h include.
 1.124 23-Jan-2018  ozaki-r Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
 1.123 21-Nov-2017  ozaki-r Use M_WAITOK to allocate mbufs wherever sleepable

Further changes will get rid of unnecessary NULL checks then.
 1.122 17-Oct-2017  ozaki-r Fix buffer length for ipsec_logsastr
 1.121 03-Oct-2017  ozaki-r Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
 1.120 28-Sep-2017  christos - sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
 1.119 19-Sep-2017  ozaki-r Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
 1.118 10-Aug-2017  ozaki-r Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
 1.117 07-Aug-2017  ozaki-r Remove out-of-date log output

Pointed out by riastradh@
 1.116 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.115 02-Aug-2017  ozaki-r Comment out unused functions
 1.114 02-Aug-2017  ozaki-r Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
 1.113 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.112 26-Jul-2017  ozaki-r Fix indentation

Pointed out by knakahara@
 1.111 26-Jul-2017  ozaki-r Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
 1.110 21-Jul-2017  ozaki-r Remove ipsecrequest#sav
 1.109 21-Jul-2017  ozaki-r Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
 1.108 21-Jul-2017  ozaki-r Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
 1.107 21-Jul-2017  ozaki-r Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
 1.106 19-Jul-2017  ozaki-r Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
 1.105 19-Jul-2017  ozaki-r Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
 1.104 18-Jul-2017  ozaki-r Restore a comment removed in previous

The comment is valid for the below code.
 1.103 18-Jul-2017  ozaki-r Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
 1.102 12-Jul-2017  ozaki-r Omit unnecessary NULL checks for sav->sah
 1.101 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.100 14-Jun-2017  ozaki-r KNF
 1.99 02-Jun-2017  ozaki-r branches: 1.99.2;
Assert inph_locked on ipsec_pcb_skip_ipsec (was IPSEC_PCB_SKIP_IPSEC)

The assertion confirms SP caches are accessed under inph lock (solock).
 1.98 02-Jun-2017  ozaki-r Rename IPSEC_PCBHINT_MAYBE to IPSEC_PCBHINT_UNKNOWN

MAYBE is maybe unclear.
 1.97 02-Jun-2017  ozaki-r Get rid of redundant NULL check (NFC)
 1.96 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.95 30-May-2017  ozaki-r Make refcnt operations of SA and SP atomic

Using atomic opeartions isn't optimal and should be optimized somehow
in the future though, the change allows a kernel with NET_MPSAFE to
run out a benchmark, which is useful to know performance improvement
and degradation by code changes.
 1.94 23-May-2017  ozaki-r Use __arraycount (NFC)
 1.93 23-May-2017  ozaki-r Disable secspacq stuffs that are now unused

The stuffs are used only if sp->policy == IPSEC_POLICY_IPSEC
&& sp->req == NULL (see ipsec{4,6}_checkpolicy). However, in the
current implementation, sp->req never be NULL (except for the
moments of SP allocation and deallocation) if sp->policy is
IPSEC_POLICY_IPSEC.

It seems that the facility was partially implemented in the KAME
era and wasn't completed. Make it clear that the facility is
unused for now by #ifdef notyet. Eventually we should complete
the implementation or remove it entirely.
 1.92 19-May-2017  ozaki-r Introduce IPSECLOG and replace ipseclog and DPRINTF with it
 1.91 16-May-2017  ozaki-r Fix diagnostic assertion failure in ipsec_init_policy

panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file "../../../../netipsec/ipsec.c", line 1277
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
ipsec_init_policy() at netbsd:ipsec_init_policy+0x149
in_pcballoc() at netbsd:in_pcballoc+0x1c5
tcp_attach_wrapper() at netbsd:tcp_attach_wrapper+0x1e1
sonewconn() at netbsd:sonewconn+0x1ea
syn_cache_get() at netbsd:syn_cache_get+0x15f
tcp_input() at netbsd:tcp_input+0x1689
ipintr() at netbsd:ipintr+0xa88
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe811d337ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f

Reported by msaitoh@
 1.90 16-May-2017  ozaki-r Use kmem(9) instead of malloc/free

Some of non-sleepable allocations can be replaced with sleepable ones.
To make it clear that the replacements are possible, some assertions
are addded.
 1.89 15-May-2017  ozaki-r Show __func__ instead of __FILE__ in debug log messages

__func__ is shorter and more useful than __FILE__.
 1.88 11-May-2017  ryo Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.87 10-May-2017  ozaki-r Stop ipsec4_output returning SP to the caller

SP isn't used by the caller (ip_output) and also holding its
reference looks unnecessary.
 1.86 08-May-2017  ozaki-r Omit two arguments of ipsec4_process_packet

flags is unused and tunalready is always 0. So NFC.
 1.85 28-Apr-2017  ozaki-r Fix function name in log message
 1.84 25-Apr-2017  ozaki-r branches: 1.84.2;
Check if solock of PCB is held when SP caches in the PCB are accessed

To this end, a back pointer from inpcbpolicy to inpcb_hdr is added.
 1.83 21-Apr-2017  ozaki-r Use inph for variable name of struct inpcb_hdr (NFC)
 1.82 20-Apr-2017  ozaki-r Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.81 20-Apr-2017  ozaki-r Provide IPSEC_DIR_* validation macros
 1.80 19-Apr-2017  ozaki-r Use KASSERT for sanity checks of function arguments
 1.79 19-Apr-2017  ozaki-r Change ifdef DIAGNOSTIC + panic to KASSERT
 1.78 19-Apr-2017  ozaki-r Fix indentations (NFC)
 1.77 19-Apr-2017  ozaki-r Tweak KEYDEBUG macros

Let's avoid passing statements to a macro.
 1.76 19-Apr-2017  ozaki-r Change panic if DIAGNOSTIC to KASSERT

One can be changed to CTASSERT.
 1.75 19-Apr-2017  ozaki-r Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.74 19-Apr-2017  ozaki-r Improve message on assertion failure
 1.73 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.72 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.71 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.70 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.69 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.68 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.67 08-Dec-2016  ozaki-r branches: 1.67.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.66 01-Apr-2015  ozaki-r branches: 1.66.2;
Pull out ipsec routines from ip6_input

This change reduces symbol references from netinet6 to netipsec
and improves modularity of netipsec.

No functional change is intended.
 1.65 01-Apr-2015  ozaki-r Fix wrong comments
 1.64 13-Aug-2014  plunky branches: 1.64.2;
C99 6.5.15 Conditional operator note 3 states that the second and
third operators of a ?: operation shoud (amongst other conditions)
either both be integer type, or both void type. cast the second
to (void) then, as log() is already a void and no result is desired.
 1.63 30-May-2014  christos branches: 1.63.2; 1.63.4; 1.63.8;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.62 24-Dec-2013  christos branches: 1.62.2;
fix debugging output printfs to use __func__ so they print the correct names.
 1.61 24-Dec-2013  degroote fix a typo in the log ouput of ipsec4_get_policy
 1.60 08-Jun-2013  rmind branches: 1.60.2;
Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.
 1.59 08-Jun-2013  rmind Split IPSec logic from ip_output() into a separate routine - ipsec4_output().
No change to the mechanism intended. Tested by christos@.
 1.58 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.57 07-Dec-2012  christos rename pcb_sp to policy to avoid:
$SRC/arch/arm/include/pcb.h:#define pcb_sp pcb_un.un_32.pcb32_sp
$SRC/arch/arm/include/pcb.h:#define pcb_sp pcb_sf.sf_r13
 1.56 13-Mar-2012  elad branches: 1.56.2;
Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.55 09-Jun-2011  drochner branches: 1.55.2; 1.55.6; 1.55.8; 1.55.12; 1.55.14;
more "const"
 1.54 08-Jun-2011  dyoung Fiddle a bit with const's to make FAST_IPSEC compile.
 1.53 05-Jun-2011  christos more malloc style.
 1.52 05-Jun-2011  christos - sprinkle const
- malloc style
 1.51 16-May-2011  drochner branches: 1.51.2;
cosmetical whitespace changes
 1.50 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.49 11-Feb-2011  drochner invalidate the secpolicy cache bin the PCB before destroying, so that
the refcount in the (global) policies gets decremented
(This apparently was missed when the policy cache code was copied
over from KAME IPSEC.)
From Wolfgang Stukenbrock per PR kern/44410, just fixed differently
to avoid unecessary differences to KAME.
 1.48 21-Jul-2010  jakllsch branches: 1.48.2; 1.48.4;
Further silence ipsec_attach().
"initializing IPsec..."" done" is of somewhat limited value.
(I normally wouldn't care; but on my box the (root) uhub(4)s attach
between the first and last portion of the line.)
 1.47 31-Jan-2010  hubertf branches: 1.47.2; 1.47.4;
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.46 30-Jul-2009  jakllsch As explained in kern/41701 there's a missing splx() here.
 1.45 25-Jun-2009  christos Only print debugging messages about policy on error.
 1.44 10-May-2009  elad Adapt FAST_IPSEC to recent KPI changes.

Pointed out by dyoung@ on tech-kern@, thanks!
 1.43 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.42 18-Mar-2009  cegger bcopy -> memcpy
 1.41 18-Mar-2009  cegger bzero -> memset
 1.40 18-Mar-2009  cegger bcmp -> memcmp
 1.39 27-Jun-2008  degroote branches: 1.39.4; 1.39.6; 1.39.10; 1.39.12; 1.39.14;
Kill caddr_t introduced in the previous revision
Fix build with FAST_IPSEC
 1.38 27-Jun-2008  mlelstv Verify icmp type and code in IPSEC rules.
Fixes PR kern/39018
 1.37 23-Apr-2008  thorpej branches: 1.37.2; 1.37.4; 1.37.6;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.36 29-Dec-2007  degroote branches: 1.36.6; 1.36.8;
Simplify the FAST_IPSEC output path
Only record an IPSEC_OUT_DONE tag when we have finished the processing
In ip{,6}_output, check this tag to know if we have already processed this
packet.
Remove some dead code (IPSEC_PENDING_TDB is not used in NetBSD)

Fix pr/36870
 1.35 09-Dec-2007  degroote branches: 1.35.2;
Kill _IP_VHL ifdef (from netinet/ip.h history, it has never been used in NetBSD so ...)
 1.34 28-Oct-2007  adrianp branches: 1.34.2; 1.34.4; 1.34.6;
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.

Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.33 07-Jul-2007  degroote branches: 1.33.6; 1.33.8; 1.33.12;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.32 08-May-2007  degroote Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).

While here, fix an error message
 1.31 15-Apr-2007  degroote Choose the good default policy, depending of the adress family of the
desired policy
 1.30 25-Mar-2007  degroote Use ip4_ah_cleartos instead of ah_cleartos for consistency
 1.29 25-Mar-2007  degroote Make an exact match when we are looking for a cached sp for an unconnected
socket. If we don't make an exact match, we may use a cached rule which
has lower priority than a rule that would otherwise have matched the
packet.

Code submitted by Karl Knutsson in PR/36051
 1.28 04-Mar-2007  degroote branches: 1.28.2; 1.28.4; 1.28.6;
Remove useless cast
Use NULL instead of (void*) 0
 1.27 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.26 10-Feb-2007  degroote branches: 1.26.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.25 16-Nov-2006  christos branches: 1.25.2;
__unused removal on arguments; approved by core.
 1.24 13-Oct-2006  christos more __unused
 1.23 10-Jun-2006  kardel branches: 1.23.6; 1.23.8;
reference time.tv_sec in non timecounter case
missing conversion spotted by Geoff Wing
XXX This code need to be checked whether UTC time
is really the right abstraction. I suspect uptime
would be the correct time scale for measuring life times.
 1.22 10-Jun-2006  kardel fix a missing conversion for a mono_time reference.
detected by Geoff Wing.
 1.21 11-Apr-2006  rpaulo branches: 1.21.2;
Add two new sysctls protected under IPSEC_DEBUG:

net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.

net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.

(a message will be printed indicating when these sysctls changed)

By Pawel Jakub Dawidek <pjd@FreeBSD.org>.
Discussed with Christos Zoulas and Jonathan Stone.
 1.20 25-Feb-2006  wiz branches: 1.20.2; 1.20.4; 1.20.6;
Fix some typos.
 1.19 11-Dec-2005  christos branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.
 1.18 05-Oct-2005  christos PR/31478: YOMURA Masanori: Inconsistent default value of net.inet.ipsec.dfbit
Changed to match netinet6 (0->2)
 1.17 10-Jun-2005  christos branches: 1.17.2;
constify and unshadow.
 1.16 08-May-2005  christos Panic strings should not end with \n.
 1.15 26-Feb-2005  perry branches: 1.15.2; 1.15.4; 1.15.6;
nuke trailing whitespace
 1.14 27-Oct-2004  jonathan branches: 1.14.4; 1.14.6;
Fix missing break; Emmanuel Dreyfus.

C.f. sys/netinet6/ipsec.c rev 1.97 -> 1.98, but does not include the
gratutious change for a case which (the comment says) should not occur.
 1.13 07-May-2004  jonathan branches: 1.13.2;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.12 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.11 21-Apr-2004  itojun kill sprintf, use snprintf
 1.10 02-Mar-2004  thorpej branches: 1.10.2;
Remove some left-over debugging code.
 1.9 02-Mar-2004  thorpej Bring the PCB policy cache over from KAME IPsec, including the "hint"
used to short-circuit IPsec processing in other places.

This is enabled only for NetBSD at the moment; in order for it to function
correctly, ipsec_pcbconn() must be called as appropriate.
 1.8 02-Mar-2004  thorpej iipsec4_get_ulp(): Fix a reversed test that would have caused us to access
bogus IP header data if presented with a short mbuf.
 1.7 24-Feb-2004  wiz occured -> occurred. From Peter Postma.
 1.6 28-Jan-2004  jonathan Change #endif __FreeBSD__ to #endif /* __FreeBSD__ */
 1.5 20-Jan-2004  jonathan IPv6 mapped adddresses require us to cope with limited polymorphism
(struct in6pcb* versus struct inpcb*) in ipsec_getpolicybysock().

Add new macros (in lieu of an abstract data type) for a ``generic''
PCB_T (points to a struct inpcb* or struct in6pcb*) to ipsec_osdep.h.
Use those new macros in ipsec_getpolicybysock() and elsewhere.

As posted to tech-net for comment/feedback, late 2003.
 1.4 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.3 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.2 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.10.2.2 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.10.2.1 10-May-2004  tron branches: 1.10.2.1.2; 1.10.2.1.4;
Pull up revision 1.13 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.10.2.1.4.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.10.2.1.2.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.13.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.13.2.5 02-Nov-2004  skrll Sync with HEAD.
 1.13.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.13.2.2 03-Aug-2004  skrll Sync with HEAD
 1.13.2.1 07-May-2004  skrll file ipsec.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.14.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.14.4.1 29-Apr-2005  kent sync with -current
 1.15.6.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.15.4.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.15.2.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.17.2.6 21-Jan-2008  yamt sync with head
 1.17.2.5 15-Nov-2007  yamt sync with head.
 1.17.2.4 03-Sep-2007  yamt sync with head.
 1.17.2.3 26-Feb-2007  yamt sync with head.
 1.17.2.2 30-Dec-2006  yamt sync with head.
 1.17.2.1 21-Jun-2006  yamt sync with head.
 1.19.6.1 22-Apr-2006  simonb Sync with head.
 1.19.4.1 09-Sep-2006  rpaulo sync with head
 1.19.2.1 01-Mar-2006  yamt sync with head.
 1.20.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.20.4.1 19-Apr-2006  elad sync with head.
 1.20.2.2 26-Jun-2006  yamt sync with head.
 1.20.2.1 24-May-2006  yamt sync with head.
 1.21.2.1 19-Jun-2006  chap Sync with head.
 1.23.8.2 10-Dec-2006  yamt sync with head.
 1.23.8.1 22-Oct-2006  yamt sync with head
 1.23.6.1 18-Nov-2006  ad Sync with head.
 1.25.2.3 31-Oct-2007  liamjfoy Pull up following revision(s) (requested by adrianp in ticket #964):
sys/netipsec/xform_ah.c: revision 1.19
sys/netipsec/ipsec.c: revision 1.34
sys/netipsec/xform_ipip.c: revision 1.18
sys/netipsec/ipsec_output.c: revision 1.23
sys/netipsec/ipsec_osdep.h: revision 1.21
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote&#64;
 1.25.2.2 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.25.2.1 12-May-2007  pavel branches: 1.25.2.1.2;
Pull up following revision(s) (requested by degroote in ticket #630):
sys/netipsec/key.c: revision 1.43-1.46
sys/netinet6/ipsec.c: revision 1.116
sys/netipsec/ipsec.c: revision 1.29 via patch
sys/netkey/key.c: revision 1.154-1.155
Call key_checkspidup with spi in network bit order in order to make
comparaison with spi stored into the sadb.
Reported by Karl Knutsson in kern/36038 .

Make an exact match when we are looking for a cached sp for an unconnected
socket. If we don't make an exact match, we may use a cached rule which
has lower priority than a rule that would otherwise have matched the
packet.
Code submitted by Karl Knutsson in PR/36051

Fix a memleak in key_spdget.
Problem was reported by Karl Knutsson by pr/36119.

In spddelete2, if we can't find the sp by this id, return after sending an
error message, don't process the following code with the NULL sp.
Spotted by Matthew Grooms on freebsd-net ML

When we construct an answer for SADB_X_SPDGET, don't use an hardcoded 0 for seq but
the seq used by the request. It will improve consistency with the answer of SADB_GET
request and helps some applications which relies both on seq and pid.
Reported by Karl Knutsson by pr/36119.
 1.25.2.1.2.2 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.25.2.1.2.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.26.2.4 17-May-2007  yamt sync with head.
 1.26.2.3 07-May-2007  yamt sync with head.
 1.26.2.2 15-Apr-2007  yamt sync with head.
 1.26.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.28.6.1 29-Mar-2007  reinoud Pullup to -current
 1.28.4.1 11-Jul-2007  mjf Sync with head.
 1.28.2.3 15-Jul-2007  ad Sync with head.
 1.28.2.2 08-Jun-2007  ad Sync with head.
 1.28.2.1 10-Apr-2007  ad Sync with head.
 1.33.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.33.8.2 09-Jan-2008  matt sync with HEAD
 1.33.8.1 06-Nov-2007  matt sync with HEAD
 1.33.6.1 28-Oct-2007  joerg Sync with HEAD.
 1.34.6.1 11-Dec-2007  yamt sync with head.
 1.34.4.1 26-Dec-2007  ad Sync with head.
 1.34.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.35.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.36.8.1 18-May-2008  yamt sync with head.
 1.36.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.36.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.37.6.2 03-Jul-2008  simonb Sync with head.
 1.37.6.1 27-Jun-2008  simonb Sync with head.
 1.37.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.37.2.6 11-Aug-2010  yamt sync with head.
 1.37.2.5 11-Mar-2010  yamt sync with head
 1.37.2.4 19-Aug-2009  yamt sync with head.
 1.37.2.3 18-Jul-2009  yamt sync with head.
 1.37.2.2 16-May-2009  yamt sync with head
 1.37.2.1 04-May-2009  yamt sync with head.
 1.39.14.1 21-Apr-2010  matt sync to netbsd-5
 1.39.12.1 07-Aug-2009  snj Pull up following revision(s) (requested by jakllsch in ticket #884):
sys/netipsec/ipsec.c: revision 1.46
As explained in kern/41701 there's a missing splx() here.
 1.39.10.2 23-Jul-2009  jym Sync with HEAD.
 1.39.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.39.6.2 14-Feb-2010  bouyer Pull up following revision(s) (requested by hubertf in ticket #1290):
sys/kern/kern_ksyms.c: revision 1.53
sys/dev/pci/agp_via.c: revision 1.18
sys/netipsec/key.c: revision 1.63
sys/arch/x86/x86/x86_autoconf.c: revision 1.49
sys/kern/init_main.c: revision 1.415
sys/kern/cnmagic.c: revision 1.11
sys/netipsec/ipsec.c: revision 1.47
sys/arch/x86/x86/pmap.c: revision 1.100
sys/netkey/key.c: revision 1.176
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.39.6.1 07-Aug-2009  snj Pull up following revision(s) (requested by jakllsch in ticket #884):
sys/netipsec/ipsec.c: revision 1.46
As explained in kern/41701 there's a missing splx() here.
 1.39.4.1 28-Apr-2009  skrll Sync with HEAD.
 1.47.4.3 12-Jun-2011  rmind sync with head
 1.47.4.2 31-May-2011  rmind sync with head
 1.47.4.1 05-Mar-2011  rmind sync with head
 1.47.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.48.4.2 05-Mar-2011  bouyer Sync with HEAD
 1.48.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.48.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.51.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.55.14.1 16-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1531):

sys/netipsec/ipsec.c: revision 1.130

Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.55.12.1 16-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1531):

sys/netipsec/ipsec.c: revision 1.130

Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.55.8.1 16-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1531):

sys/netipsec/ipsec.c: revision 1.130

Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.55.6.1 05-Apr-2012  mrg sync to latest -current.
 1.55.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.55.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.55.2.1 17-Apr-2012  yamt sync with head
 1.56.2.4 03-Dec-2017  jdolecek update from HEAD
 1.56.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.56.2.2 23-Jun-2013  tls resync from head
 1.56.2.1 25-Feb-2013  tls resync with head
 1.60.2.2 18-May-2014  rmind sync with head
 1.60.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.62.2.1 10-Aug-2014  tls Rebase.
 1.63.8.1 16-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1570):

sys/netipsec/ipsec.c: revision 1.130

Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.63.4.1 16-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1570):

sys/netipsec/ipsec.c: revision 1.130

Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.63.2.1 16-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1570):

sys/netipsec/ipsec.c: revision 1.130

Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.64.2.3 28-Aug-2017  skrll Sync with HEAD
 1.64.2.2 05-Feb-2017  skrll Sync with HEAD
 1.64.2.1 06-Apr-2015  skrll Sync with HEAD
 1.66.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.66.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.66.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.67.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.84.2.3 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.84.2.2 11-May-2017  pgoyette Sync with HEAD
 1.84.2.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.99.2.5 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #669):

sys/netipsec/ipsec.c: revision 1.134

Fix ipsec4_get_ulp(). We should do "goto done" instead of "return",
otherwise the port fields of spidx are uninitialized.

ok mlelstv@
 1.99.2.4 16-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #559):

sys/netipsec/ipsec.c: revision 1.130

Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte
AH packet. Triggerable before authentication when IPsec and forwarding
are both enabled.
 1.99.2.3 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.99.2.2 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #406):
sys/netipsec/key.c: revision 1.239
sys/netipsec/key.c: revision 1.240
sys/netipsec/key.c: revision 1.241
sys/netipsec/key.c: revision 1.242
sys/netipsec/key.h: revision 1.33
sys/netipsec/ipsec.c: revision 1.123
sys/netipsec/key.c: revision 1.236
sys/netipsec/key.c: revision 1.237
sys/netipsec/key.c: revision 1.238
Provide a function to call MGETHDR and MCLGET
The change fixes two usages of MGETHDR that don't check whether a mbuf is really
allocated before passing it to MCLGET.
Fix error handling of MCLGET in key_alloc_mbuf
Add missing splx to key_spdexpire
Use M_WAITOK to allocate mbufs wherever sleepable
Further changes will get rid of unnecessary NULL checks then.
Get rid of unnecessary NULL checks that are obsoleted by M_WAITOK
Simply the code by avoiding unnecessary error checks
- Remove unnecessary m_pullup for self-allocated mbufs
- Replace some if-fails-return sanity checks with KASSERT
Call key_sendup_mbuf immediately unless key_acquire is called in softint
We need to defer it only if it's called in softint to avoid deadlock.
 1.99.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.151.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.151.2.5 28-Jul-2018  pgoyette Sync with HEAD
 1.151.2.4 21-May-2018  pgoyette Sync with HEAD
 1.151.2.3 02-May-2018  pgoyette Synch with HEAD
 1.151.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.151.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.164.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.164.2.1 10-Jun-2019  christos Sync with HEAD
 1.177.2.1 20-Jul-2024  martin Pull up following revision(s) (requested by rin in ticket #740):

sys/netipsec/ipsec_input.c: revision 1.79
sys/netipsec/ipsec_output.c: revision 1.86
sys/netipsec/ipsec.c: revision 1.178
sys/netinet6/ip6_output.c: revision 1.232

ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.93 28-Oct-2022  ozaki-r Remove in_pcb_hdr.h
 1.92 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.91 28-Aug-2020  ozaki-r ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.
 1.90 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.89 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.88 12-Jun-2019  christos make DPRINTF use varyadic cpp macros, and merge with IPSECLOG.
 1.87 17-Jan-2019  knakahara Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.
 1.86 22-Nov-2018  knakahara Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ.

Add ATF later.
 1.85 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.84 27-Oct-2018  maxv Localify one function, and switch to C99 types while here.
 1.83 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.82 14-May-2018  maxv branches: 1.82.2;
Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
 1.81 10-May-2018  maxv Rename ipsec4_forward -> ipsec_mtu, and switch to void.
 1.80 01-May-2018  maxv Remove some more dead code.
 1.79 29-Apr-2018  maxv Remove unused and misleading argument from ipsec_set_policy.
 1.78 29-Apr-2018  maxv Remove duplicate prototype.
 1.77 28-Apr-2018  maxv Move the ipsec6_input prototype into ipsec6.h, and style.
 1.76 28-Apr-2018  maxv Stop using a macro, rename the function to ipsec_init_pcbpolicy directly.
 1.75 28-Apr-2018  maxv Style and remove unused stuff.
 1.74 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.73 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.72 03-Apr-2018  maxv Remove ipsec_copy_policy and ipsec_copy_pcbpolicy. No functional change,
since we used only ipsec_copy_pcbpolicy, and it was a no-op.

Originally we were using ipsec_copy_policy to optimize the IPsec-PCB
cache: when an ACK was received in response to a SYN, we used to copy the
SP cached in the SYN's PCB into the ACK's PCB, so that
ipsec_getpolicybysock could use the cached SP instead of requerying it.

Then we switched to ipsec_copy_pcbpolicy which has always been a no-op. As
a result the SP cached in the SYN was/is not copied in the ACK, and the
first call to ipsec_getpolicybysock had to query the SP and cache it
itself. It's not totally clear to me why this change was made.

But it has been this way for years, and after a conversation with Ryota
Ozaki it turns out the optimization is not valid anymore due to
MP-ification, so it won't be re-enabled.

ok ozaki-r@
 1.71 27-Feb-2018  maxv branches: 1.71.2;
Dedup: merge ipsec4_set_policy and ipsec6_set_policy. The content of the
original ipsec_set_policy function is inlined into the new one.
 1.70 27-Feb-2018  maxv Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.69 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.68 26-Feb-2018  maxv Dedup: merge ipsec4_hdrsiz and ipsec6_hdrsiz into ipsec_hdrsiz.

ok ozaki-r@
 1.67 21-Feb-2018  maxv Rename:

ipsec_in_reject -> ipsec_sp_reject
ipsec_hdrsiz -> ipsec_sp_hdrsiz

localify the former, and do some cleanup while here.
 1.66 16-Feb-2018  maxv Style, remove unused and misleading macros and comments, localify, and
reduce the diff between similar functions. No functional change.
 1.65 16-Feb-2018  maxv Remove ip4_esp_randpad and ip6_esp_randpad, unused. Discussed with
ozaki-r@.
 1.64 14-Feb-2018  maxv Style, and remove unused prototypes and functions.
 1.63 14-Feb-2018  maxv Remove m_checkalignment(), unused. This eliminates a reference to
m_getptr().
 1.62 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.61 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.60 03-Oct-2017  ozaki-r Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
 1.59 10-Aug-2017  ozaki-r Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
 1.58 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.57 26-Jul-2017  ozaki-r Use pslist(9) for sptree
 1.56 21-Jul-2017  ozaki-r Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
 1.55 21-Jul-2017  ozaki-r Remove ipsecrequest#sav
 1.54 21-Jul-2017  ozaki-r Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
 1.53 21-Jul-2017  ozaki-r Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
 1.52 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.51 05-Jul-2017  ozaki-r Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
 1.50 02-Jun-2017  ozaki-r branches: 1.50.2;
Assert inph_locked on ipsec_pcb_skip_ipsec (was IPSEC_PCB_SKIP_IPSEC)

The assertion confirms SP caches are accessed under inph lock (solock).
 1.49 02-Jun-2017  ozaki-r Rename IPSEC_PCBHINT_MAYBE to IPSEC_PCBHINT_UNKNOWN

MAYBE is maybe unclear.
 1.48 19-May-2017  ozaki-r Introduce IPSECLOG and replace ipseclog and DPRINTF with it
 1.47 11-May-2017  ryo Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.46 10-May-2017  ozaki-r Stop ipsec4_output returning SP to the caller

SP isn't used by the caller (ip_output) and also holding its
reference looks unnecessary.
 1.45 08-May-2017  ozaki-r Omit two arguments of ipsec4_process_packet

flags is unused and tunalready is always 0. So NFC.
 1.44 25-Apr-2017  ozaki-r branches: 1.44.2;
Check if solock of PCB is held when SP caches in the PCB are accessed

To this end, a back pointer from inpcbpolicy to inpcb_hdr is added.
 1.43 20-Apr-2017  ozaki-r Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.42 20-Apr-2017  ozaki-r Provide IPSEC_DIR_* validation macros
 1.41 19-Apr-2017  ozaki-r Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.40 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.39 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.38 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.37 01-Apr-2015  ozaki-r branches: 1.37.2; 1.37.4;
Pull out ipsec routines from ip6_input

This change reduces symbol references from netinet6 to netipsec
and improves modularity of netipsec.

No functional change is intended.
 1.36 05-Sep-2014  matt branches: 1.36.2;
Don't use C++ keyword new
 1.35 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.34 08-Jun-2013  rmind branches: 1.34.6;
Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.
 1.33 08-Jun-2013  rmind Split IPSec logic from ip_output() into a separate routine - ipsec4_output().
No change to the mechanism intended. Tested by christos@.
 1.32 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.31 06-Jan-2012  drochner branches: 1.31.6;
more IPSEC header cleanup: don't install unneeded headers to userland,
and remove some differences berween KAME and FAST_IPSEC
 1.30 04-Jan-2012  drochner -consistently use "char *" for the compiled policy buffer in the
ipsec_*_policy() functions, as it was documented and used by clients
-remove "ipsec_policy_t" which was undocumented and only present
in the KAME version of the ipsec.h header
-misc cleanup of historical artefacts, and to remove unnecessary
differences between KAME ans FAST_IPSEC
 1.29 09-Jun-2011  drochner branches: 1.29.2; 1.29.6;
more "const"
 1.28 08-Jun-2011  dyoung Fiddle a bit with const's to make FAST_IPSEC compile.
 1.27 05-Jun-2011  christos - sprinkle const
- malloc style
 1.26 16-May-2011  drochner branches: 1.26.2;
use time_t rather than long for timestamps
 1.25 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.24 10-May-2009  elad branches: 1.24.4; 1.24.6; 1.24.8;
Adapt FAST_IPSEC to recent KPI changes.

Pointed out by dyoung@ on tech-kern@, thanks!
 1.23 12-Nov-2008  ad branches: 1.23.4;
Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.22 23-Apr-2008  thorpej branches: 1.22.2; 1.22.8; 1.22.10;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.21 29-Dec-2007  degroote branches: 1.21.6; 1.21.8;
Simplify the FAST_IPSEC output path
Only record an IPSEC_OUT_DONE tag when we have finished the processing
In ip{,6}_output, check this tag to know if we have already processed this
packet.
Remove some dead code (IPSEC_PENDING_TDB is not used in NetBSD)

Fix pr/36870
 1.20 04-Mar-2007  christos branches: 1.20.16; 1.20.22; 1.20.28;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.19 18-Feb-2007  degroote Remove __P
Remove useless extern
Use ansi declaration
 1.18 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.17 16-Nov-2006  christos branches: 1.17.4;
__unused removal on arguments; approved by core.
 1.16 13-Oct-2006  christos more __unused
 1.15 11-Apr-2006  rpaulo branches: 1.15.8; 1.15.10;
Add two new sysctls protected under IPSEC_DEBUG:

net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.

net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.

(a message will be printed indicating when these sysctls changed)

By Pawel Jakub Dawidek <pjd@FreeBSD.org>.
Discussed with Christos Zoulas and Jonathan Stone.
 1.14 16-Feb-2006  perry branches: 1.14.2; 1.14.4; 1.14.6;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.13 24-Dec-2005  perry branches: 1.13.2; 1.13.4; 1.13.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.12 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.11 10-Jun-2005  christos branches: 1.11.2;
constify and unshadow.
 1.10 07-May-2004  jonathan branches: 1.10.2;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.9 30-Apr-2004  jonathan Minimal cleanup of sys/netipsec/ipsec{,_osdep}.h, to allow compiling
FAST_IPSEC headers (with declarations of stats structures) in
userspace code. I haven't checked for strict POSIX conformance, but
Sam Leffler's FreeBS `ipsecstats' tool will now compile, if you
manually make and populate usr/include/sys/netipsec.

Committed as-is for Andrew Brown to check more of the sys/netipsec sysctls.
 1.8 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.7 02-Mar-2004  thorpej branches: 1.7.2;
Bring the PCB policy cache over from KAME IPsec, including the "hint"
used to short-circuit IPsec processing in other places.

This is enabled only for NetBSD at the moment; in order for it to function
correctly, ipsec_pcbconn() must be called as appropriate.
 1.6 20-Jan-2004  jonathan IPv6 mapped adddresses require us to cope with limited polymorphism
(struct in6pcb* versus struct inpcb*) in ipsec_getpolicybysock().

Add new macros (in lieu of an abstract data type) for a ``generic''
PCB_T (points to a struct inpcb* or struct in6pcb*) to ipsec_osdep.h.
Use those new macros in ipsec_getpolicybysock() and elsewhere.

As posted to tech-net for comment/feedback, late 2003.
 1.5 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.4 24-Nov-2003  scw For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.
 1.3 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.2 15-Aug-2003  jonathan Change ipsec4_common_input() to return void (not int with errno,
as in FreeBSD), to match NetBSD protosw prototype.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.7.2.2 10-May-2004  tron Pull up revision 1.9 (requested by jonathan in ticket #280):
Minimal cleanup of sys/netipsec/ipsec{,_osdep}.h, to allow compiling
FAST_IPSEC headers (with declarations of stats structures) in
userspace code. I haven't checked for strict POSIX conformance, but
Sam Leffler's FreeBS `ipsecstats' tool will now compile, if you
manually make and populate usr/include/sys/netipsec.
Committed as-is for Andrew Brown to check more of the sys/netipsec sysctls.
 1.7.2.1 10-May-2004  tron Pull up revision 1.10 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.10.2.6 11-Dec-2005  christos Sync with head.
 1.10.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.10.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.10.2.2 03-Aug-2004  skrll Sync with HEAD
 1.10.2.1 07-May-2004  skrll file ipsec.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.11.2.5 21-Jan-2008  yamt sync with head
 1.11.2.4 03-Sep-2007  yamt sync with head.
 1.11.2.3 26-Feb-2007  yamt sync with head.
 1.11.2.2 30-Dec-2006  yamt sync with head.
 1.11.2.1 21-Jun-2006  yamt sync with head.
 1.13.6.1 22-Apr-2006  simonb Sync with head.
 1.13.4.1 09-Sep-2006  rpaulo sync with head
 1.13.2.1 18-Feb-2006  yamt sync with head.
 1.14.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.14.4.1 19-Apr-2006  elad sync with head.
 1.14.2.1 24-May-2006  yamt sync with head.
 1.15.10.2 10-Dec-2006  yamt sync with head.
 1.15.10.1 22-Oct-2006  yamt sync with head
 1.15.8.1 18-Nov-2006  ad Sync with head.
 1.17.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.17.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.20.28.1 02-Jan-2008  bouyer Sync with HEAD
 1.20.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.20.16.1 09-Jan-2008  matt sync with HEAD
 1.21.8.1 18-May-2008  yamt sync with head.
 1.21.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.21.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.22.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.22.8.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.22.2.2 16-May-2009  yamt sync with head
 1.22.2.1 04-May-2009  yamt sync with head.
 1.23.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.24.8.1 05-Mar-2011  bouyer Sync with HEAD
 1.24.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.24.4.3 12-Jun-2011  rmind sync with head
 1.24.4.2 31-May-2011  rmind sync with head
 1.24.4.1 05-Mar-2011  rmind sync with head
 1.26.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.29.6.1 18-Feb-2012  mrg merge to -current.
 1.29.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.2.1 17-Apr-2012  yamt sync with head
 1.31.6.3 03-Dec-2017  jdolecek update from HEAD
 1.31.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.31.6.1 23-Jun-2013  tls resync from head
 1.34.6.1 10-Aug-2014  tls Rebase.
 1.36.2.2 28-Aug-2017  skrll Sync with HEAD
 1.36.2.1 06-Apr-2015  skrll Sync with HEAD
 1.37.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.37.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.37.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.44.2.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.44.2.1 11-May-2017  pgoyette Sync with HEAD
 1.50.2.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.50.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.71.2.7 18-Jan-2019  pgoyette Synch with HEAD
 1.71.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.71.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.71.2.4 21-May-2018  pgoyette Sync with HEAD
 1.71.2.3 02-May-2018  pgoyette Synch with HEAD
 1.71.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.71.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.82.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.82.2.1 10-Jun-2019  christos Sync with HEAD
 1.31 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.30 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.29 14-May-2018  maxv branches: 1.29.2;
Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
 1.28 28-Apr-2018  maxv Move the ipsec6_input prototype into ipsec6.h, and style.
 1.27 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.26 27-Feb-2018  maxv branches: 1.26.2;
Dedup: merge ipsec4_set_policy and ipsec6_set_policy. The content of the
original ipsec_set_policy function is inlined into the new one.
 1.25 27-Feb-2018  maxv Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.24 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.23 26-Feb-2018  maxv Dedup: merge ipsec4_checkpolicy and ipsec6_checkpolicy into
ipsec_checkpolicy.

ok ozaki-r@
 1.22 16-Feb-2018  maxv Style, remove unused and misleading macros and comments, localify, and
reduce the diff between similar functions. No functional change.
 1.21 16-Feb-2018  maxv Remove ip4_esp_randpad and ip6_esp_randpad, unused. Discussed with
ozaki-r@.
 1.20 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.19 25-Jul-2017  ozaki-r Remove unused macro
 1.18 05-Jul-2017  ozaki-r Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
 1.17 20-Apr-2017  ozaki-r branches: 1.17.4;
Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.16 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.15 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.14 07-Jul-2016  msaitoh branches: 1.14.2; 1.14.4;
KNF. Remove extra spaces. No functional change.
 1.13 09-Jun-2011  drochner branches: 1.13.12; 1.13.30;
more "const"
 1.12 10-May-2009  elad branches: 1.12.4; 1.12.10;
Adapt FAST_IPSEC to recent KPI changes.

Pointed out by dyoung@ on tech-kern@, thanks!
 1.11 27-Apr-2008  degroote branches: 1.11.14;
Fix some fallout from socket locking patch :

- {ah6,esp6}_ctlinput must return void*
- use correct wrapper for rip_usrreq
 1.10 23-Apr-2008  thorpej branches: 1.10.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.9 04-Mar-2007  christos branches: 1.9.36; 1.9.38;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.8 18-Feb-2007  degroote Forgot to remove two useless extern
 1.7 18-Feb-2007  degroote Remove __P
Remove useless extern
Use ansi declaration
 1.6 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.5 10-Feb-2007  degroote branches: 1.5.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.4 10-Dec-2005  elad branches: 1.4.24; 1.4.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.3 04-Dec-2003  atatat branches: 1.3.4; 1.3.18;
Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.2 20-Nov-2003  jonathan This file was derived from FreeBSD, where "in6pcb" is a macro for
"inpcb", and this struct inpcb* and struct inp6cb* are the same type.

On NetBSD they are different types, so we must change the types of
formal argument in IPv6-specific functions from "struct inpcb *" to
"struct in6pcb*".

The code didn't compile on NetBSD beforehand, if both FAST_IPSEC + INET6
were configured. This fix will cause even more short-term breakage for
that case, but its a step in the right direction: it shows up what
still needs to be fixed.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.3.18.3 03-Sep-2007  yamt sync with head.
 1.3.18.2 26-Feb-2007  yamt sync with head.
 1.3.18.1 21-Jun-2006  yamt sync with head.
 1.3.4.5 11-Dec-2005  christos Sync with head.
 1.3.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.3.4.2 03-Aug-2004  skrll Sync with HEAD
 1.3.4.1 04-Dec-2003  skrll file ipsec6.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.4.26.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.4.24.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.5.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.5.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.9.38.1 18-May-2008  yamt sync with head.
 1.9.36.1 02-Jun-2008  mjf Sync with HEAD.
 1.10.2.2 16-May-2009  yamt sync with head
 1.10.2.1 16-May-2008  yamt sync with head.
 1.11.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.10.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.12.4.1 12-Jun-2011  rmind sync with head
 1.13.30.2 28-Aug-2017  skrll Sync with HEAD
 1.13.30.1 09-Jul-2016  skrll Sync with HEAD
 1.13.12.1 03-Dec-2017  jdolecek update from HEAD
 1.14.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.14.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.14.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.17.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.26.2.3 21-May-2018  pgoyette Sync with HEAD
 1.26.2.2 02-May-2018  pgoyette Synch with HEAD
 1.26.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.29.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.82 12-Aug-2025  knakahara Fix dst address log which shows src address wrongly, pointed out by ohishi@IIJ.
 1.81 05-Jul-2024  rin branches: 1.81.2;
sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.80 10-Feb-2024  andvar branches: 1.80.2;
Fix various typos in comments, log messages and documentation.
 1.79 27-Jan-2023  ozaki-r ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.78 23-Aug-2022  knakahara branches: 1.78.4;
Improve IPsec log when no key association found for SA. Implemented by ohishi@IIJ.
 1.77 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.76 19-May-2022  christos PR/56840: Andrew Cagney: use the proper polarity hton/ntoh macros (no
functional change).
Factor out spi retrieving code into a function.
 1.75 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.74 17-Jan-2019  knakahara Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.
 1.73 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.72 27-Oct-2018  maxv Localify one function, and switch to C99 types while here.
 1.71 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.70 18-May-2018  maxv branches: 1.70.2;
IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.69 29-Apr-2018  maxv Remove useless icmp6.h include, remove manual externs and include in6.h
to get proper definitions, and remove duplicate logic in
ipsec6_common_input_cb.
 1.68 29-Apr-2018  maxv Remove obsolete/dead code, the IP-in-IP encapsulation doesn't work this
way anymore (XF_IP4 partly dropped by FAST_IPSEC).
 1.67 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.66 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.65 18-Apr-2018  maxv Remove unused malloc.h include.
 1.64 17-Apr-2018  maxv fix comments
 1.63 15-Apr-2018  maxv Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.
 1.62 26-Feb-2018  maxv branches: 1.62.2;
Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.61 26-Feb-2018  maxv If 'skip' is lower than sizeof(struct ip), we are in trouble. So remove a
nonsensical branch, and add a panic at the beginning of the function.
 1.60 26-Feb-2018  maxv m is never allowed to be NULL, so turn the KASSERT (and the null check)
to a panic.
 1.59 26-Feb-2018  maxv Merge some minor (mostly stylistic) changes from last week.
 1.58 21-Feb-2018  maxv Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.57 21-Feb-2018  maxv Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.
 1.56 08-Feb-2018  maxv Remove unused net_osdep.h include.
 1.55 24-Jan-2018  maxv Fix the iteration: IPPROTO_FRAGMENT options are special, in the sense
that they don't have a 'length' field. It is therefore incorrect to
read ip6e.ip6e_len, it contains garbage.

I'm not sure whether this an exploitable vulnerability. Because of this
bug you could theoretically craft 'protoff', which means that you can
have the kernel patch the nxt value at the wrong place once the packet
is decrypted. Perhaps it can be used in some unusual MITM - a router that
happens to be between two IPsec hosts adds a frag6 option in the outer
IPv6 header to trigger the bug in the receiver -, but I couldn't come up
with anything worrying.
 1.54 24-Jan-2018  maxv ipsec4_fixup_checksum calls m_pullup, so don't forget to do mtod() again,
to prevent use-after-free.

In fact, the m_pullup call is never reached: it is impossible for 'skip'
to be zero in this function, so add an XXX for now.
 1.53 23-Jan-2018  ozaki-r Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
 1.52 23-Jan-2018  ozaki-r KNF: replace soft tabs with hard tabs
 1.51 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.50 03-Aug-2017  ozaki-r Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
 1.49 21-Jul-2017  ozaki-r Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
 1.48 12-Jul-2017  ozaki-r Omit unnecessary NULL checks for sav->sah
 1.47 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.46 06-Jul-2017  ozaki-r Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
 1.45 05-Jul-2017  ozaki-r Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
 1.44 28-Jun-2017  christos PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
 1.43 19-May-2017  ozaki-r branches: 1.43.2;
Introduce IPSECLOG and replace ipseclog and DPRINTF with it
 1.42 11-May-2017  ryo Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.41 19-Apr-2017  ozaki-r branches: 1.41.2;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.40 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.39 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.38 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.37 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.36 10-Jun-2016  ozaki-r branches: 1.36.2; 1.36.4;
Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.35 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.34 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.33 30-Mar-2015  ozaki-r Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.32 08-Mar-2014  ozaki-r branches: 1.32.4; 1.32.6; 1.32.8; 1.32.12;
Mark a variable __diagused
 1.31 03-Nov-2013  mrg - apply some __diagused
- remove unused variables
- move some variables inside their relevant use #ifdef
 1.30 04-Jun-2013  christos branches: 1.30.2;
PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.29 25-Jan-2012  drochner branches: 1.29.2; 1.29.6; 1.29.8; 1.29.16;
After IPSEC input processing, pass a decoded/authenticated IPv4 packet
to upper layers through the IP protosw, as done for IPv6.
Before it was reinjected into the IP netisr queue which caused more
overhead and caused artefacts like double IP option processing.
Works well for me, should get more testing and review.
 1.28 17-Jul-2011  joerg branches: 1.28.2; 1.28.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.27 21-Feb-2011  drochner adopt a fix from OpenBSD: when scanning the IPv6 header chain, take
into account that the extension header type is not in the extension
header itself but in the previous one -- this makes a difference
because (a) the length field is different for AH than for all others
and (b) the offset of the "next type" field isn't the same in primary
and extension headers.
(I didn't manage to trigger the bug in my tests, no extension headers
besides AH made it to that point. Didn't try hard enough -- the fix
is still valid.)
 1.26 18-Feb-2011  drochner deal with IPv6 address scope, so that SA lookup for
link-local addresses works
(PR kern/43071 is related, but refers to KAME IPSEC)
 1.25 17-Feb-2011  drochner handle some unlikely IPv6 error case like everywhere else:
free mbuf, inc statcounter. from OpenBSD
being here, fix a diagnostic output
 1.24 16-Feb-2011  drochner remove some unnecessary pointer typecasts
(one was wrong on BE systems, but was harmless here because the
result is effectively unused)
 1.23 18-Apr-2009  tsutsui branches: 1.23.4; 1.23.6; 1.23.8;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.22 18-Mar-2009  cegger bcopy -> memcpy
 1.21 18-Mar-2009  cegger bzero -> memset
 1.20 23-Apr-2008  thorpej branches: 1.20.2; 1.20.10; 1.20.16;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.19 15-Apr-2008  thorpej branches: 1.19.2;
Make ip6 and icmp6 stats per-cpu.
 1.18 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.17 27-Jun-2007  degroote branches: 1.17.28;
Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.16 04-Mar-2007  degroote branches: 1.16.2; 1.16.4;
Remove useless cast
Use NULL instead of (void*) 0
 1.15 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.14 10-Feb-2007  degroote branches: 1.14.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.13 16-Nov-2006  christos branches: 1.13.2; 1.13.4;
__unused removal on arguments; approved by core.
 1.12 13-Oct-2006  christos more __unused
 1.11 11-Dec-2005  christos branches: 1.11.20; 1.11.22;
merge ktrace-lwp.
 1.10 26-Feb-2005  perry branches: 1.10.4;
nuke trailing whitespace
 1.9 24-Apr-2004  jonathan branches: 1.9.2; 1.9.6; 1.9.8;
Add `const' to the safety-catch local definition of ip6_protosw,
to maatch sys/netinet6/ip6protosw.
 1.8 20-Mar-2004  jonathan Temporarily ifdef out sys/netipsec/ipsec_input.c:esp6_ctlinput(),
as there is a duplicate version in (my) ipsec_netbsd.c, with somewhat
newer IP-multicast tests.
 1.7 01-Mar-2004  thorpej Add missing copyright notices (FreeBSD rev 1.2.4.2).
 1.6 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.5 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.4 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.3 15-Aug-2003  jonathan Fix bug with IP_DF handling which was breaking TCP: on FreeBSD, ip_off
is assumed to be in host byteorder during the input(?) path. NetBSD
keeps ip_off and ip_len in network order. Add (or remove) byteswaps
accordingly. TCP over fast_ipsec now works with PMTU, as well as without.
 1.2 15-Aug-2003  jonathan Change ipsec4_common_input() to return void (not int with errno,
as in FreeBSD), to match NetBSD protosw prototype.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.9.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.9.6.1 29-Apr-2005  kent sync with -current
 1.9.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.9.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.9.2.2 03-Aug-2004  skrll Sync with HEAD
 1.9.2.1 24-Apr-2004  skrll file ipsec_input.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.10.4.3 03-Sep-2007  yamt sync with head.
 1.10.4.2 26-Feb-2007  yamt sync with head.
 1.10.4.1 30-Dec-2006  yamt sync with head.
 1.11.22.2 10-Dec-2006  yamt sync with head.
 1.11.22.1 22-Oct-2006  yamt sync with head
 1.11.20.1 18-Nov-2006  ad Sync with head.
 1.13.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.13.2.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.14.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.16.4.1 11-Jul-2007  mjf Sync with head.
 1.16.2.1 15-Jul-2007  ad Sync with head.
 1.17.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.19.2.1 18-May-2008  yamt sync with head.
 1.20.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.20.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.20.2.1 04-May-2009  yamt sync with head.
 1.23.8.2 05-Mar-2011  bouyer Sync with HEAD
 1.23.8.1 17-Feb-2011  bouyer Sync with HEAD
 1.23.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.23.4.1 05-Mar-2011  rmind sync with head
 1.28.6.1 18-Feb-2012  mrg merge to -current.
 1.28.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.28.2.1 17-Apr-2012  yamt sync with head
 1.29.16.1 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1536):
sys/netipsec/ipsec_input.c: 1.57-1.58
Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.
--
Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.29.8.1 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1536):
sys/netipsec/ipsec_input.c: 1.57-1.58
Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.
--
Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.29.6.3 03-Dec-2017  jdolecek update from HEAD
 1.29.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.29.6.1 23-Jun-2013  tls resync from head
 1.29.2.1 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1536):
sys/netipsec/ipsec_input.c: 1.57-1.58
Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.
--
Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.30.2.1 18-May-2014  rmind sync with head
 1.32.12.1 03-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1577):
sys/netipsec/ipsec_input.c: 1.57 1.58
sys/netipsec/ipsec_input.c: 1.57
Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.
Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.32.8.1 03-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1577):
sys/netipsec/ipsec_input.c: 1.57 1.58
sys/netipsec/ipsec_input.c: 1.57
Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.
Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.32.6.4 28-Aug-2017  skrll Sync with HEAD
 1.32.6.3 05-Feb-2017  skrll Sync with HEAD
 1.32.6.2 09-Jul-2016  skrll Sync with HEAD
 1.32.6.1 06-Apr-2015  skrll Sync with HEAD
 1.32.4.1 03-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1577):
sys/netipsec/ipsec_input.c: 1.57 1.58
sys/netipsec/ipsec_input.c: 1.57
Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.
Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.36.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.36.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.36.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.41.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.43.2.5 31-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #677):

sys/netipsec/ipsec_input.c: revision 1.55

Fix the iteration: IPPROTO_FRAGMENT options are special, in the sense
that they don't have a 'length' field. It is therefore incorrect to
read ip6e.ip6e_len, it contains garbage.

I'm not sure whether this an exploitable vulnerability. Because of this
bug you could theoretically craft 'protoff', which means that you can
have the kernel patch the nxt value at the wrong place once the packet
is decrypted. Perhaps it can be used in some unusual MITM - a router that
happens to be between two IPsec hosts adds a frag6 option in the outer
IPv6 header to trigger the bug in the receiver -, but I couldn't come up
with anything worrying.
 1.43.2.4 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #667):

sys/netipsec/ipsec_input.c: revision 1.54

ipsec4_fixup_checksum calls m_pullup, so don't forget to do mtod() again,
to prevent use-after-free.

In fact, the m_pullup call is never reached: it is impossible for 'skip'
to be zero in this function, so add an XXX for now.
 1.43.2.3 06-Mar-2018  martin Pull up following revision(s) (requested by maxv):
sys/netipsec/ipsec_input.c: revision 1.57
sys/netipsec/ipsec_input.c: revision 1.58

Extend these #ifdef notyet. The m_copydata's in these branches are wrong,
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.

Argh, in my previous commit in this file I forgot to fix the IPv6
entry point; apply the same fix there.
 1.43.2.2 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.43.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.62.2.7 18-Jan-2019  pgoyette Synch with HEAD
 1.62.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.62.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.62.2.4 21-May-2018  pgoyette Sync with HEAD
 1.62.2.3 02-May-2018  pgoyette Synch with HEAD
 1.62.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.62.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.70.2.1 10-Jun-2019  christos Sync with HEAD
 1.78.4.2 16-Aug-2025  martin Pull up following revision(s) (requested by knakahara in ticket #1150):

sys/netipsec/ipsec_input.c: revision 1.82

Fix dst address log which shows src address wrongly, pointed out by ohishi@IIJ.
 1.78.4.1 20-Jul-2024  martin Pull up following revision(s) (requested by rin in ticket #740):

sys/netipsec/ipsec_input.c: revision 1.79
sys/netipsec/ipsec_output.c: revision 1.86
sys/netipsec/ipsec.c: revision 1.178
sys/netinet6/ip6_output.c: revision 1.232

ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.80.2.1 02-Aug-2025  perseant Sync with HEAD
 1.81.2.1 16-Aug-2025  martin Pull up following revision(s) (requested by knakahara in ticket #10):

sys/netipsec/ipsec_input.c: revision 1.82

Fix dst address log which shows src address wrongly, pointed out by ohishi@IIJ.
 1.30 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.29 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.28 31-May-2018  maxv branches: 1.28.2;
Clarify, remove superfluous things.
 1.27 28-Apr-2018  maxv Inline M_EXT_WRITABLE directly, and remove the XXX, there's nothing wrong
in the use of !M_READONLY.
 1.26 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.25 18-Apr-2018  maxv Remove unused includes, remove misleading comments, and style.
 1.24 17-Apr-2018  maxv Fix a pretty bad mistake, that has always been there.

m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);

This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.

Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.23 17-Apr-2018  maxv Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.

The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().
 1.22 10-Mar-2018  maxv Add KASSERTs.
 1.21 05-Mar-2018  maxv branches: 1.21.2;
Improve stupid check, style, and fix leak (m, not m0).
 1.20 26-Feb-2018  maxv Merge some minor (mostly stylistic) changes from last week.
 1.19 14-Feb-2018  maxv Remove m_checkalignment(), unused. This eliminates a reference to
m_getptr().
 1.18 08-Feb-2018  maxv Remove unused net_osdep.h include.
 1.17 01-Feb-2018  maxv Replace ovbcopy -> memmove, same.
 1.16 19-May-2017  ozaki-r branches: 1.16.2;
Introduce IPSECLOG and replace ipseclog and DPRINTF with it
 1.15 19-Apr-2017  ozaki-r Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.14 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.13 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.12 16-May-2011  drochner branches: 1.12.10; 1.12.14; 1.12.16; 1.12.24; 1.12.30; 1.12.32; 1.12.34; 1.12.36; 1.12.40; 1.12.42;
remove redundant declaration
 1.11 23-Apr-2008  thorpej branches: 1.11.24; 1.11.30;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.10 14-Dec-2007  seanb branches: 1.10.6; 1.10.8;
- Remove remain <= MHLEN restriction in m_makespace()
PR:30124
 1.9 04-Mar-2007  degroote branches: 1.9.16; 1.9.24; 1.9.28;
Fix fallout from caddr_t changes
 1.8 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.7 11-Dec-2005  christos branches: 1.7.24; 1.7.26; 1.7.30; 1.7.34;
merge ktrace-lwp.
 1.6 26-Feb-2005  perry branches: 1.6.4;
nuke trailing whitespace
 1.5 07-May-2004  jonathan branches: 1.5.2; 1.5.6; 1.5.8;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.4 01-Mar-2004  thorpej branches: 1.4.2;
Add missing copyright notice (FreeBSD rev. 1.5.2.2).
 1.3 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.2 13-Aug-2003  jonathan Make sure one (potentially) overlapping copy is safe.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.4.2.1 10-May-2004  tron Pull up revision 1.5 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.5.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.5.6.1 29-Apr-2005  kent sync with -current
 1.5.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.5.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.2 03-Aug-2004  skrll Sync with HEAD
 1.5.2.1 07-May-2004  skrll file ipsec_mbuf.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.6.4.2 21-Jan-2008  yamt sync with head
 1.6.4.1 03-Sep-2007  yamt sync with head.
 1.7.34.1 04-Feb-2008  riz Pull up following revision(s) (requested by seanb in ticket #1015):
sys/netipsec/ipsec_mbuf.c: revision 1.10 via patch
- Remove remain <= MHLEN restriction in m_makespace()
PR:30124
 1.7.30.1 03-Jun-2008  skrll Sync with netbsd-4.
 1.7.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.7.24.1 04-Feb-2008  riz Pull up following revision(s) (requested by seanb in ticket #1015):
sys/netipsec/ipsec_mbuf.c: revision 1.10 via patch
- Remove remain <= MHLEN restriction in m_makespace()
PR:30124
 1.9.28.1 02-Jan-2008  bouyer Sync with HEAD
 1.9.24.1 26-Dec-2007  ad Sync with head.
 1.9.16.1 09-Jan-2008  matt sync with HEAD
 1.10.8.1 18-May-2008  yamt sync with head.
 1.10.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.11.30.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.24.1 31-May-2011  rmind sync with head
 1.12.42.1 17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1599):

sys/netipsec/ipsec_mbuf.c: revision 1.23,1.24 (via patch)

Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.

The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().

Fix a pretty bad mistake, that has always been there.

m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);

This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.

Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.12.40.1 21-Apr-2017  bouyer Sync with HEAD
 1.12.36.1 26-Apr-2017  pgoyette Sync with HEAD
 1.12.34.1 17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1599):

sys/netipsec/ipsec_mbuf.c: revision 1.23,1.24 (via patch)

Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.

The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().

Fix a pretty bad mistake, that has always been there.

m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);

This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.

Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.12.32.1 28-Aug-2017  skrll Sync with HEAD
 1.12.30.1 17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1599):

sys/netipsec/ipsec_mbuf.c: revision 1.23,1.24 (via patch)

Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.

The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().

Fix a pretty bad mistake, that has always been there.

m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);

This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.

Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.12.24.1 18-Apr-2018  msaitoh Pull up following revision(s) (requested by maxv in ticket #1545):
sys/netipsec/ipsec_mbuf.c: revision 1.23
sys/netipsec/ipsec_mbuf.c: revision 1.24
Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.
The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().
Fix a pretty bad mistake, that has always been there.
m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);
This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.
Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.12.16.1 18-Apr-2018  msaitoh Pull up following revision(s) (requested by maxv in ticket #1545):
sys/netipsec/ipsec_mbuf.c: revision 1.23
sys/netipsec/ipsec_mbuf.c: revision 1.24
Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.
The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().
Fix a pretty bad mistake, that has always been there.
m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);
This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.
Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.12.14.1 03-Dec-2017  jdolecek update from HEAD
 1.12.10.1 18-Apr-2018  msaitoh Pull up following revision(s) (requested by maxv in ticket #1545):
sys/netipsec/ipsec_mbuf.c: revision 1.23
sys/netipsec/ipsec_mbuf.c: revision 1.24
Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.
The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().
Fix a pretty bad mistake, that has always been there.
m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);
This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.
Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.16.2.1 17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #773):

sys/netipsec/ipsec_mbuf.c: revision 1.23,1.24

Don't assume M_PKTHDR is set only on the first mbuf of the chain. It
should, but it looks like there are several places that can put M_PKTHDR
on secondary mbufs (PR/53189), so drop this assumption right now to
prevent further bugs.

The check is replaced by (m1 != m), which is equivalent to the previous
code: we want to modify m->m_pkthdr.len only when 'm' was not passed in
m_adj().

Fix a pretty bad mistake, that has always been there.

m_adj(m1, -(m1->m_len - roff));
if (m1 != m)
m->m_pkthdr.len -= (m1->m_len - roff);

This is wrong: m_adj will modify m1->m_len, so we're using a wrong value
when manually adjusting m->m_pkthdr.len.

Because of that, it is possible to exploit the attack I described in
uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100%
reliably.
 1.21.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.21.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.21.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.21.2.3 02-May-2018  pgoyette Synch with HEAD
 1.21.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.21.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.28.2.1 10-Jun-2019  christos Sync with HEAD
 1.56 26-Feb-2025  andvar Fix typos in comments, mainly s/calcurate/calculate/.
 1.55 02-Sep-2022  thorpej branches: 1.55.10;
Remove unnecessary inclusion of <net/netisr.h>.
 1.54 28-Apr-2018  maxv Fix the net.inet6.ipsec6.def_policy node, the variable should be
&ip6_def_policy.policy, otherwise we're overwriting other fields of the
structure.
 1.53 22-Apr-2018  maxv Rename ipip_allow->ipip_spoofcheck, and add net.inet.ipsec.ipip_spoofcheck.
Makes it simpler, and also fixes PR/39919.
 1.52 18-Apr-2018  maxv Remove unused malloc.h include.
 1.51 18-Apr-2018  maxv Style, and remove another misleading comment.
 1.50 18-Apr-2018  maxv Remove misleading comments.
 1.49 18-Apr-2018  maxv Remove the

net.inet6.esp6
net.inet6.ipcomp6
net.inet6.ah6

subtrees. They are aliases to net.inet6.ipsec6, but they are not
consistent with the original intended naming. (eg there was
net.inet6.esp6.esp_trans_deflev instead of net.inet6.esp6.trans_deflev).
 1.48 18-Apr-2018  maxv Remove duplicate sysctls:

net.inet.esp.trans_deflev = net.inet.ipsec.esp_trans_deflev
net.inet.esp.net_deflev = net.inet.ipsec.esp_net_deflev
net.inet.ah.cleartos = net.inet.ipsec.ah_cleartos
net.inet.ah.offsetmask = net.inet.ipsec.ah_offsetmask
net.inet.ah.trans_deflev = net.inet.ipsec.ah_trans_deflev
net.inet.ah.net_deflev = net.inet.ipsec.ah_net_deflev

Use the convention on the right. Discussed a month ago on tech-net@.
 1.47 26-Feb-2018  maxv branches: 1.47.2;
Merge some minor (mostly stylistic) changes from last week.
 1.46 16-Feb-2018  maxv Add [ah/esp/ipcomp]_enable sysctls, and remove the FreeBSD #ifdefs.
Discussed with ozaki-r@.
 1.45 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.44 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.43 04-Jul-2017  ozaki-r KNF
 1.42 04-Jul-2017  ozaki-r Introduce and use SADB_SASTATE_USABLE_P
 1.41 04-Jul-2017  ozaki-r KNF; replace leading whitespaces with hard tabs
 1.40 06-Apr-2017  ozaki-r branches: 1.40.6;
Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.39 06-Mar-2017  knakahara add sysctl to select software/hardware encryption driver. can enable CRYPTO_DEBUG.
 1.38 07-Jul-2016  msaitoh branches: 1.38.2; 1.38.4;
KNF. Remove extra spaces. No functional change.
 1.37 30-May-2014  christos branches: 1.37.4;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.36 25-Feb-2014  pooka branches: 1.36.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.35 11-Jun-2013  christos branches: 1.35.2;
remove the last vestiges of fast_ipsec
 1.34 02-Jun-2012  dsl branches: 1.34.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.33 17-Jul-2011  joerg branches: 1.33.2;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.32 04-May-2008  thorpej Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.31 27-Apr-2008  degroote Fix some fallout from socket locking patch :

- {ah6,esp6}_ctlinput must return void*
- use correct wrapper for rip_usrreq
 1.30 23-Apr-2008  thorpej branches: 1.30.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.29 19-Oct-2007  ad branches: 1.29.16; 1.29.18;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.28 07-Jul-2007  degroote branches: 1.28.6; 1.28.8; 1.28.12;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.27 27-Jun-2007  degroote Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.26 11-Apr-2007  degroote Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.
 1.25 25-Mar-2007  degroote Use ip4_ah_cleartos instead of ah_cleartos for consistency
 1.24 04-Mar-2007  degroote branches: 1.24.2; 1.24.4; 1.24.6;
Remove useless cast
Use NULL instead of (void*) 0
 1.23 04-Mar-2007  degroote Fix fallout from caddr_t changes
 1.22 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.21 18-Feb-2007  degroote Always free the sav, not only in the mature case
 1.20 18-Feb-2007  degroote Fix the {ah,esp}4_ctlinput code
 1.19 18-Feb-2007  degroote Constify the code following the dyoung change ( the "bug" was hidden by the
extern declaration ).
While here, remove a Kame ifdef which is useless in netipsec code
 1.18 10-Feb-2007  degroote branches: 1.18.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.17 14-May-2006  elad branches: 1.17.12; 1.17.14;
integrate kauth.
 1.16 11-Apr-2006  rpaulo Add two new sysctls protected under IPSEC_DEBUG:

net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.

net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.

(a message will be printed indicating when these sysctls changed)

By Pawel Jakub Dawidek <pjd@FreeBSD.org>.
Discussed with Christos Zoulas and Jonathan Stone.
 1.15 11-Dec-2005  christos branches: 1.15.4; 1.15.6; 1.15.8; 1.15.10; 1.15.12;
merge ktrace-lwp.
 1.14 20-Jun-2005  atatat branches: 1.14.2;
Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
 1.13 26-Feb-2005  perry nuke trailing whitespace
 1.12 15-Aug-2004  atatat branches: 1.12.4; 1.12.6;
Remove redundant instantiation of esp_net_deflev sysctl node. Not
sure how this happened, but it didn't harm anything either way.

Addresses PR kern/26672.
 1.11 17-Jul-2004  atatat branches: 1.11.2;
Rework sys/netipsec/ipsec_netbsd.c to present a more consistent tree.

Rework usr.bin/netstat/fast_ipsec.c to find the stats nodes under the
new names (Kame uses the name stats so we use different ones), as well
as setting slen appropriately between calls to sysctlbyname(), and
providing forward compatibility when actually retrieving stats via
sysctlbyname().

And correct a spelling error.
 1.10 07-May-2004  jonathan Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.9 06-Apr-2004  keihan s/netbsd.org/NetBSD.org/g
 1.8 24-Mar-2004  atatat branches: 1.8.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.7 20-Mar-2004  jonathan Delint for compiling with INET6:

Add 'XXX FIXME' comments to ah4_ctlinput(), esp4_ctlinput()
ipcode-paths merely cast away local variables ip, ah/esp, sav; the
fast-ipsec IPv4 code appears to work even so.

In espv6_ctlinput(), call the fast-ipsec KEY_ALLOCSA()/KEY_FREESA()
macros, not the KAME-native key_allocsa()/key_freesa() functions.
Cast sa6_src/sa6_dst to void; the fast-ipsec API does not (yet) pass
both src and dst addrs to KEY_d-ALLOCSA/KEY_FREESA.

Make sure 'off' is set to 0 on the branch where it was formerly
used-before-set.

Will now compile with ``options INET6'' (as in
sys/arch/i386/conf/GENERIC.FAST_IPSEC), but is not yet
expected to acutally work with IPv6.
 1.6 02-Mar-2004  thorpej Bring the PCB policy cache over from KAME IPsec, including the "hint"
used to short-circuit IPsec processing in other places.

This is enabled only for NetBSD at the moment; in order for it to function
correctly, ipsec_pcbconn() must be called as appropriate.
 1.5 23-Jan-2004  jonathan Remove ``#ifdef IPSEC'' include block; they are not appropriate here.

Remove #ifdef FAST_IPSEC/#endif around the inclusion of local
(sys/netipsec) header files; they are always appropriate for
this file (sys/netipsec/ipsec_netbsd.c). At least on NetBSD.

If INET6 is defined, include appropriate header files
(local netipsec/ipsec6.h, netinet6/ip6protosw.h, and icmp6.h
from its standards-compliant location in netinet/).

Will now at least compile and link when ``options INET6' is configured.
 1.4 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.3 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.2 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.8.2.3 16-Aug-2004  jmc Pullup rev 1.12 (requested by atatat in ticket #766)

Remove redundant instantiation of esp_net_deflev sysctl node. PR#26672
 1.8.2.2 17-Jul-2004  he Pull up revision 1.11 (requested by atatat in ticket #674):
Rework ipsec_netbsd.c to present a more ocnsistent tree.
Rework netstat to find the stats nodes under the new names.
 1.8.2.1 10-May-2004  tron Pull up revision 1.10 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.11.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.11.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.11.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.11.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.11.2.2 03-Aug-2004  skrll Sync with HEAD
 1.11.2.1 17-Jul-2004  skrll file ipsec_netbsd.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.12.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.12.4.1 29-Apr-2005  kent sync with -current
 1.14.2.4 27-Oct-2007  yamt sync with head.
 1.14.2.3 03-Sep-2007  yamt sync with head.
 1.14.2.2 26-Feb-2007  yamt sync with head.
 1.14.2.1 21-Jun-2006  yamt sync with head.
 1.15.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.15.10.1 19-Apr-2006  elad sync with head.
 1.15.8.1 24-May-2006  yamt sync with head.
 1.15.6.1 22-Apr-2006  simonb Sync with head.
 1.15.4.1 09-Sep-2006  rpaulo sync with head
 1.17.14.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.17.12.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.18.2.3 15-Apr-2007  yamt sync with head.
 1.18.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.18.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.24.6.1 29-Mar-2007  reinoud Pullup to -current
 1.24.4.1 11-Jul-2007  mjf Sync with head.
 1.24.2.4 23-Oct-2007  ad Sync with head.
 1.24.2.3 15-Jul-2007  ad Sync with head.
 1.24.2.2 08-Jun-2007  ad Sync with head.
 1.24.2.1 10-Apr-2007  ad Sync with head.
 1.28.12.1 25-Oct-2007  bouyer Sync with HEAD.
 1.28.8.1 06-Nov-2007  matt sync with HEAD
 1.28.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.29.18.1 18-May-2008  yamt sync with head.
 1.29.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.30.2.1 16-May-2008  yamt sync with head.
 1.33.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.33.2.1 30-Oct-2012  yamt sync with head
 1.34.2.3 03-Dec-2017  jdolecek update from HEAD
 1.34.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.2.1 23-Jun-2013  tls resync from head
 1.35.2.1 18-May-2014  rmind sync with head
 1.36.2.1 10-Aug-2014  tls Rebase.
 1.37.4.2 28-Aug-2017  skrll Sync with HEAD
 1.37.4.1 09-Jul-2016  skrll Sync with HEAD
 1.38.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.38.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.38.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.40.6.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.47.2.2 02-May-2018  pgoyette Synch with HEAD
 1.47.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.55.10.1 02-Aug-2025  perseant Sync with HEAD
 1.27 19-Apr-2017  ozaki-r Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.26 22-Jun-2016  knakahara branches: 1.26.2; 1.26.4;
fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.25 28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.24 09-May-2013  gdt branches: 1.24.10;
Fix FAST_IPSEC locking violation.

Without this change, using ESP tunnels with FAST_IPSEC on a 2-cpu i386
machine results in an mbuf leak. This change was tested in netbsd-6.

When FAST_IPSEC is enabled and a tunnel is set up, after the
outer packet is stripped off, FAST_IPSEC queues the inner
packet on the appropriate queue (ipinstrq or ip6instrq).
These queues require the KERNEL_LOCK to be held before
using the queue, and the FAST_IPSEC code did not take the
KERNEL_LOCK as required.
KERNEL_LOCK and KERNEL_UNLOCK_ONE calls have been added.

If a struct ifnet instance is passed to the if_handoff
function which does this queuing, the interface's
if_start function may be called. Some hardware devices
require KERNEL_LOCK to be held; others do not. Looking
at the body of NetBSD code, other places where an if_start
function is called, KERNEL_LOCK is held. Thus, the lock is
not released in if_handoff until after the if_start function
is called. In practice, having the kernel lock when
if_start is called makes no difference - there is not
a single instance in all of the NetBSD code where
if_handoff is passed an instance of struct ifnet.

This commit is the work of Bev Schwartz of BBN.

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced Research
Projects Agency and Space and Naval Warfare Systems Center, Pacific, under
Contract No. N66001-09-C-2073.
 1.23 29-Nov-2011  drochner branches: 1.23.8;
add missing rnd_extract->cprng_fast conversion, fixes build of
FAST_IPSEC kernels
 1.22 20-Jan-2008  joerg branches: 1.22.44;
Now that __HAVE_TIMECOUNTER and __HAVE_GENERIC_TODR are invariants,
remove the conditionals and the code associated with the undef case.
 1.21 28-Oct-2007  adrianp branches: 1.21.2; 1.21.8;
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.

Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.20 10-Jun-2006  kardel branches: 1.20.10; 1.20.24; 1.20.30; 1.20.32; 1.20.36;
clarify time scale semantic issue
 1.19 10-Jun-2006  kardel reference time.tv_sec in non timecounter case
missing conversion spotted by Geoff Wing
XXX This code need to be checked whether UTC time
is really the right abstraction. I suspect uptime
would be the correct time scale for measuring life times.
 1.18 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.17 16-Feb-2006  perry branches: 1.17.2; 1.17.8;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.16 24-Dec-2005  perry branches: 1.16.2; 1.16.4; 1.16.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.15 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.14 04-Dec-2005  christos Merge the 3 copies of m_getcl() so that fast ipsec compiles again together
with net80211. XXX: We don't really have an m_getcl(), we just emulate it.
 1.13 18-Aug-2005  yamt - introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.12 07-May-2005  christos branches: 1.12.2;
PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
 1.11 26-Feb-2005  perry branches: 1.11.2; 1.11.4; 1.11.6;
nuke trailing whitespace
 1.10 30-Apr-2004  jonathan branches: 1.10.2; 1.10.6; 1.10.8;
Minimal cleanup of sys/netipsec/ipsec{,_osdep}.h, to allow compiling
FAST_IPSEC headers (with declarations of stats structures) in
userspace code. I haven't checked for strict POSIX conformance, but
Sam Leffler's FreeBS `ipsecstats' tool will now compile, if you
manually make and populate usr/include/sys/netipsec.

Committed as-is for Andrew Brown to check more of the sys/netipsec sysctls.
 1.9 16-Mar-2004  jonathan branches: 1.9.2;
Remove the old, inet4-specific versions of PCB_T, PCB_FAMILY, and PCB_SOCKET,
and the surrounding #ifndef notyet/#else/#endif which had the removed lines
in the #else branch. The inpcb_hdr versions have been in use for
some time now.
 1.8 02-Mar-2004  thorpej Bring the PCB policy cache over from KAME IPsec, including the "hint"
used to short-circuit IPsec processing in other places.

This is enabled only for NetBSD at the moment; in order for it to function
correctly, ipsec_pcbconn() must be called as appropriate.
 1.7 01-Mar-2004  thorpej Add missing copyright notice (FreeBSD rev. 1.1).
 1.6 20-Jan-2004  jonathan IPv6 mapped adddresses require us to cope with limited polymorphism
(struct in6pcb* versus struct inpcb*) in ipsec_getpolicybysock().

Add new macros (in lieu of an abstract data type) for a ``generic''
PCB_T (points to a struct inpcb* or struct in6pcb*) to ipsec_osdep.h.
Use those new macros in ipsec_getpolicybysock() and elsewhere.

As posted to tech-net for comment/feedback, late 2003.
 1.5 16-Jan-2004  scw Since callers of m_getcl() assume it always allocates a cluster, check
that MGETCL() actually succeeded before returning the mbuf.
 1.4 11-Nov-2003  jonathan Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.3 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.2 29-Sep-2003  jonathan No copyrignt notice here (caught by Sam Leffler). Add the same two-clause
copyright I sent to Sam Leffler for the FreeBSD version.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.9.2.2 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.9.2.1 10-May-2004  tron branches: 1.9.2.1.2; 1.9.2.1.4;
Pull up revision 1.10 (requested by jonathan in ticket #280):
Minimal cleanup of sys/netipsec/ipsec{,_osdep}.h, to allow compiling
FAST_IPSEC headers (with declarations of stats structures) in
userspace code. I haven't checked for strict POSIX conformance, but
Sam Leffler's FreeBS `ipsecstats' tool will now compile, if you
manually make and populate usr/include/sys/netipsec.
Committed as-is for Andrew Brown to check more of the sys/netipsec sysctls.
 1.9.2.1.4.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.9.2.1.2.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.10.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.10.6.1 29-Apr-2005  kent sync with -current
 1.10.2.7 11-Dec-2005  christos Sync with head.
 1.10.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.10.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.10.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.10.2.2 03-Aug-2004  skrll Sync with HEAD
 1.10.2.1 30-Apr-2004  skrll file ipsec_osdep.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.11.6.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.11.4.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.11.2.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.12.2.3 21-Jan-2008  yamt sync with head
 1.12.2.2 15-Nov-2007  yamt sync with head.
 1.12.2.1 21-Jun-2006  yamt sync with head.
 1.16.6.2 22-Apr-2006  simonb Sync with head.
 1.16.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.16.4.1 09-Sep-2006  rpaulo sync with head
 1.16.2.1 18-Feb-2006  yamt sync with head.
 1.17.8.1 19-Jun-2006  chap Sync with head.
 1.17.2.1 26-Jun-2006  yamt sync with head.
 1.20.36.1 13-Nov-2007  bouyer Sync with HEAD
 1.20.32.2 23-Mar-2008  matt sync with HEAD
 1.20.32.1 06-Nov-2007  matt sync with HEAD
 1.20.30.1 28-Oct-2007  joerg Sync with HEAD.
 1.20.24.1 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.20.10.1 31-Oct-2007  liamjfoy Pull up following revision(s) (requested by adrianp in ticket #964):
sys/netipsec/xform_ah.c: revision 1.19
sys/netipsec/ipsec.c: revision 1.34
sys/netipsec/xform_ipip.c: revision 1.18
sys/netipsec/ipsec_output.c: revision 1.23
sys/netipsec/ipsec_osdep.h: revision 1.21
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote&#64;
 1.21.8.1 23-Jan-2008  bouyer Sync with HEAD.
 1.21.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.22.44.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.22.44.1 17-Apr-2012  yamt sync with head
 1.23.8.2 03-Dec-2017  jdolecek update from HEAD
 1.23.8.1 23-Jun-2013  tls resync from head
 1.24.10.3 28-Aug-2017  skrll Sync with HEAD
 1.24.10.2 09-Jul-2016  skrll Sync with HEAD
 1.24.10.1 29-May-2016  skrll Sync with HEAD
 1.26.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.26.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.87 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.86 27-Jan-2023  ozaki-r branches: 1.86.6;
ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.85 10-Apr-2022  andvar branches: 1.85.4;
fix various typos in comments and output/log messages.
 1.84 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.83 19-Sep-2019  ozaki-r Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.82 26-Dec-2018  knakahara branches: 1.82.4;
ipsecif(4) supports multiple peers in the same NAPT.

E.g. ipsec0 connects between NetBSD_A and NetBSD_B, ipsec1 connects
NetBSD_A and NetBSD_C at the following figure.

+----------+
+----| NetBSD_B |
+----------+ +------+ | +----------+
| NetBSD_A |--- ... ---| NAPT |---+
+----------+ +------+ | +----------+
+----| NetBSD_C |
+----------+

Add ATF later.
 1.81 22-Nov-2018  knakahara Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ.

Add ATF later.
 1.80 31-May-2018  maxv branches: 1.80.2;
Adapt rev1.75, suggested by Alexander Bluhm. Relax the checks to allow
protocols smaller than two bytes (only IPPROTO_NONE). While here style.
 1.79 31-May-2018  maxv Remove support for non-IKE markers in the kernel. Discussed on tech-net@,
and now in PR/53334. Basically non-IKE markers come from a deprecated
draft, and our kernel code for them has never worked.

Setsockopt will now reject UDP_ENCAP_ESPINUDP_NON_IKE.

Perhaps we should also add a check in key_handle_natt_info(), to make
sure we also reject UDP_ENCAP_ESPINUDP_NON_IKE in the SADB.
 1.78 07-May-2018  maxv Remove a dummy reference to XF_IP4, explain briefly why we don't use
ipe4_xformsw, and remove unused includes.
 1.77 07-May-2018  maxv Remove now unused 'isr', 'skip' and 'protoff' arguments from ipip_output.
 1.76 07-May-2018  maxv Remove unused 'mp' argument from all the xf_output functions. Also clean
up xform.h a bit.
 1.75 01-May-2018  maxv Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.74 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.73 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.72 18-Apr-2018  maxv style
 1.71 05-Mar-2018  maxv branches: 1.71.2;
Call m_pullup earlier, fixes one branch.
 1.70 03-Mar-2018  maxv Add KASSERTs, we don't want m_nextpkt in ipsec{4/6}_process_packet.
 1.69 26-Feb-2018  maxv Fix mbuf mistake: we are using ip6 before it is pulled up properly.
 1.68 21-Feb-2018  maxv Style, no functional change.
 1.67 21-Feb-2018  maxv Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).
 1.66 08-Feb-2018  maxv Remove unused net_osdep.h include.
 1.65 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.64 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.63 03-Oct-2017  ozaki-r Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
 1.62 03-Oct-2017  ozaki-r Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
 1.61 03-Oct-2017  ozaki-r Pull out ipsec_fill_saidx_bymbuf (NFC)
 1.60 10-Aug-2017  ozaki-r Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
 1.59 10-Aug-2017  ozaki-r Simplify ipsec_reinject_ipstack (NFC)
 1.58 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.57 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.56 21-Jul-2017  ozaki-r Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
 1.55 19-Jul-2017  ozaki-r Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
 1.54 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.53 13-Jul-2017  ozaki-r Fix splx isn't called on some error paths
 1.52 13-Jul-2017  ozaki-r Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
 1.51 12-Jul-2017  ozaki-r Omit unnecessary NULL checks for sav->sah
 1.50 06-Jul-2017  ozaki-r Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
 1.49 04-Jul-2017  ozaki-r Simplify IPSEC_OSTAT macro (NFC)
 1.48 19-May-2017  ozaki-r branches: 1.48.2;
Introduce IPSECLOG and replace ipseclog and DPRINTF with it
 1.47 11-May-2017  ryo Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.46 08-May-2017  ozaki-r Omit two arguments of ipsec4_process_packet

flags is unused and tunalready is always 0. So NFC.
 1.45 19-Apr-2017  ozaki-r branches: 1.45.2;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.44 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.43 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.42 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.41 30-Mar-2015  ozaki-r branches: 1.41.2; 1.41.4;
Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.40 03-Nov-2013  mrg branches: 1.40.4; 1.40.6; 1.40.8; 1.40.12;
- apply some __diagused
- remove unused variables
- move some variables inside their relevant use #ifdef
 1.39 04-Jun-2013  christos branches: 1.39.2;
PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.38 10-Jan-2012  drochner branches: 1.38.2; 1.38.6; 1.38.8; 1.38.16;
add patch from Arnaud Degroote to handle IPv6 extended options with
(FAST_)IPSEC, tested lightly with a DSTOPTS header consisting
of PAD1
 1.37 31-Aug-2011  plunky branches: 1.37.2; 1.37.6;
NULL does not need a cast
 1.36 09-Jun-2011  drochner catch a case where an ip6 address with scope embedded was compared with
one without -- interestingly this didn't break the connection but just
caused a useless encapsulation
(this code needs to be rearranged to get it clean)
 1.35 07-Jun-2011  drochner fix tunnel encapsulation in ipsec6_process_packet() -- it is not
completely clean yet, but at least a v6-in-v6 tunnel works now
 1.34 07-Jun-2011  drochner reindent ipsec6_process_packet() - whitespace changes only
 1.33 06-Jun-2011  drochner remove a limitation that inner and outer IP version must be equal
for an ESP tunnel, and add some fixes which make v4-in-v6 work
(v6 as inner protocol isn't ready, even v6-in-v6 can never have worked)

being here, fix a statistics counter and kill an unused variable
 1.32 18-Feb-2011  drochner branches: 1.32.2;
do proper statistics counting for outbound packets, fixes PR kern/30182
by Gilles Roy
 1.31 10-Feb-2011  drochner in rev.1.192 of ip_output.c the semantics of ip_output() was changed:
Before, setting the IP_RAWOUTPUT flag did imply that the ip_id
(the fragmentation thing) was used as-is.
Now, a new ID is diced unless the new IP_NOIPNEWID flag is set.
The ip_id is part of the data which are used to calculate the hash
for AH, so set the IP_NOIPNEWID flag to make sure the IP header
is not modified behind AH's back. Otherwise, the recipient will detect
a checksum mismatch and discard the packet.
 1.30 10-Feb-2011  drochner -in opencrypto callbacks (which run in a kernel thread), pull softnet_lock
everywhere splsoftnet() was used before, to fix MP concurrency problems
-pull KERNEL_LOCK where ip(6)_output() is called, as this is what
the network stack (unfortunately) expects, in particular to avoid
races for packets in the interface send queues
From Wolfgang Stukenbrock per PR kern/44418, with the application
of KERNEL_LOCK to what I think are the essential points, tested
on a dual-core i386.
 1.29 01-Dec-2009  dyoung branches: 1.29.4; 1.29.6; 1.29.8;
Cosmetic: fix indentation, change some spaces to tabs.
 1.28 28-Apr-2008  degroote Fix a stupid typo. In ipsec6_process_packet, reinject the packet in AF_INET6,
nor in AF_INET.
 1.27 23-Apr-2008  thorpej branches: 1.27.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.26 29-Dec-2007  degroote branches: 1.26.6; 1.26.8;
Fix the ipsec processing in case of USE rules with no SA installed.

In case where there is no more isr to process, just tag the packet and reinject
in the ip{,6} stack.

Fix pr/34843
 1.25 29-Dec-2007  degroote Simplify the FAST_IPSEC output path
Only record an IPSEC_OUT_DONE tag when we have finished the processing
In ip{,6}_output, check this tag to know if we have already processed this
packet.
Remove some dead code (IPSEC_PENDING_TDB is not used in NetBSD)

Fix pr/36870
 1.24 09-Dec-2007  degroote branches: 1.24.2;
Kill _IP_VHL ifdef (from netinet/ip.h history, it has never been used in NetBSD so ...)
 1.23 28-Oct-2007  adrianp branches: 1.23.2; 1.23.4; 1.23.6;
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.

Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.22 27-Jun-2007  degroote branches: 1.22.6; 1.22.8; 1.22.12;
Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.21 10-Feb-2007  degroote branches: 1.21.6; 1.21.8;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.20 26-Jan-2007  dyoung KNF: bzero -> memset.
 1.19 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.18 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.17 24-Nov-2006  christos branches: 1.17.2; 1.17.4;
fix spelling of accommodate; from Zapher.
 1.16 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.15 13-Oct-2006  christos more __unused
 1.14 11-Dec-2005  christos branches: 1.14.20; 1.14.22;
merge ktrace-lwp.
 1.13 07-May-2004  jonathan branches: 1.13.2; 1.13.12; 1.13.14; 1.13.22; 1.13.24;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.12 17-Mar-2004  jonathan branches: 1.12.2;
sys/netinet6/ip6_ecn.h is reportedly a FreeBSD-ism; NetBSD has
prototypes for the IPv6 ECN ingress/egress functions in sys/netinet/ip_ecn.h,
inside an #ifdef INET6 wrapper. So, wrap sys/netipsec ocurrences of
#include <netinet6/ip6_ecn.h>
in #ifdef __FreeBSD__/#endif, until both camps can agree on this
teensy little piece of namespace. Affects:
ipsec_output.c xform_ah.c xform_esp.c xform_ipip.c
 1.11 16-Mar-2004  jonathan Delint ntohl() as argument to a "%lx" format in a log message.
 1.10 16-Mar-2004  jonathan #include <net/net_osdep.h>: if INET6 is configured,
ipsec_encapsulate() calls ovbcopy(), which is otherwise deprecated.
 1.9 01-Mar-2004  thorpej Add missing copyright notice (FreeBSD rev. 1.3.2.2).
 1.8 16-Jan-2004  scw Fix ipip_output() to always set *mp to NULL on failure, even if 'm'
is NULL, otherwise ipsec4_process_packet() may try to m_freem() a
bad pointer.

In ipsec4_process_packet(), don't try to m_freem() 'm' twice; ipip_output()
already did it.
 1.7 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.6 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.5 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.4 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.3 15-Aug-2003  jonathan Fix bug with IP_DF handling which was breaking TCP: on FreeBSD, ip_off
is assumed to be in host byteorder during the input(?) path. NetBSD
keeps ip_off and ip_len in network order. Add (or remove) byteswaps
accordingly. TCP over fast_ipsec now works with PMTU, as well as without.
 1.2 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.12.2.2 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.12.2.1 10-May-2004  tron branches: 1.12.2.1.2; 1.12.2.1.4;
Pull up revision 1.13 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.12.2.1.4.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.12.2.1.2.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.13.24.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.13.22.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.13.14.5 21-Jan-2008  yamt sync with head
 1.13.14.4 15-Nov-2007  yamt sync with head.
 1.13.14.3 03-Sep-2007  yamt sync with head.
 1.13.14.2 26-Feb-2007  yamt sync with head.
 1.13.14.1 30-Dec-2006  yamt sync with head.
 1.13.12.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.13.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.13.2.2 03-Aug-2004  skrll Sync with HEAD
 1.13.2.1 07-May-2004  skrll file ipsec_output.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.14.22.3 18-Dec-2006  yamt sync with head.
 1.14.22.2 10-Dec-2006  yamt sync with head.
 1.14.22.1 22-Oct-2006  yamt sync with head
 1.14.20.3 01-Feb-2007  ad Sync with head.
 1.14.20.2 12-Jan-2007  ad Sync with head.
 1.14.20.1 18-Nov-2006  ad Sync with head.
 1.17.4.2 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.17.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.17.2.2 31-Oct-2007  liamjfoy Pull up following revision(s) (requested by adrianp in ticket #964):
sys/netipsec/xform_ah.c: revision 1.19
sys/netipsec/ipsec.c: revision 1.34
sys/netipsec/xform_ipip.c: revision 1.18
sys/netipsec/ipsec_output.c: revision 1.23
sys/netipsec/ipsec_osdep.h: revision 1.21
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote&#64;
 1.17.2.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.21.8.1 11-Jul-2007  mjf Sync with head.
 1.21.6.1 15-Jul-2007  ad Sync with head.
 1.22.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.22.8.2 09-Jan-2008  matt sync with HEAD
 1.22.8.1 06-Nov-2007  matt sync with HEAD
 1.22.6.1 28-Oct-2007  joerg Sync with HEAD.
 1.23.6.1 11-Dec-2007  yamt sync with head.
 1.23.4.1 26-Dec-2007  ad Sync with head.
 1.23.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.24.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.26.8.1 18-May-2008  yamt sync with head.
 1.26.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.27.2.2 11-Mar-2010  yamt sync with head
 1.27.2.1 16-May-2008  yamt sync with head.
 1.29.8.2 05-Mar-2011  bouyer Sync with HEAD
 1.29.8.1 17-Feb-2011  bouyer Sync with HEAD
 1.29.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.29.4.2 12-Jun-2011  rmind sync with head
 1.29.4.1 05-Mar-2011  rmind sync with head
 1.32.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.37.6.1 18-Feb-2012  mrg merge to -current.
 1.37.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.37.2.1 17-Apr-2012  yamt sync with head
 1.38.16.1 03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1546):

sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch)

Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).

Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.38.8.1 03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1546):

sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch)

Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).

Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.38.6.3 03-Dec-2017  jdolecek update from HEAD
 1.38.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.38.6.1 23-Jun-2013  tls resync from head
 1.38.2.1 03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1546):

sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch)

Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).

Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.39.2.1 18-May-2014  rmind sync with head
 1.40.12.1 03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1600):

sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch)

Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).

Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.40.8.1 03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1600):

sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch)

Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).

Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.40.6.2 28-Aug-2017  skrll Sync with HEAD
 1.40.6.1 06-Apr-2015  skrll Sync with HEAD
 1.40.4.1 03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1600):

sys/netipsec/ipsec_output.c: revision 1.67,1.75 (via patch)

Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).

Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.41.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.41.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.45.2.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.45.2.1 11-May-2017  pgoyette Sync with HEAD
 1.48.2.4 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.48.2.3 05-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #799):

sys/netipsec/ipsec_output.c: revision 1.75
sys/netipsec/ipsec_output.c: revision 1.67

Strengthen this check, to make sure there is room for an ip6_ext structure.
Seems possible to crash m_copydata here (but I didn't test more than that).

Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I
already fixed half of the problem two months ago in rev1.67, back then I
thought it was not triggerable because each packet we emit is guaranteed
to have correctly formed IPv6 options; but it is actually triggerable via
IPv6 forwarding, we emit a packet we just received, and we don't sanitize
its options before invoking IPsec.

Since it would be wrong to just stop the iteration and continue the IPsec
processing, allow compute_ipsec_pos to fail, and when it does, drop the
packet entirely.
 1.48.2.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.48.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.71.2.6 18-Jan-2019  pgoyette Synch with HEAD
 1.71.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.71.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.71.2.3 21-May-2018  pgoyette Sync with HEAD
 1.71.2.2 02-May-2018  pgoyette Synch with HEAD
 1.71.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.80.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.80.2.1 10-Jun-2019  christos Sync with HEAD
 1.82.4.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.85.4.1 20-Jul-2024  martin Pull up following revision(s) (requested by rin in ticket #740):

sys/netipsec/ipsec_input.c: revision 1.79
sys/netipsec/ipsec_output.c: revision 1.86
sys/netipsec/ipsec.c: revision 1.178
sys/netinet6/ip6_output.c: revision 1.232

ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.86.6.1 02-Aug-2025  perseant Sync with HEAD
 1.9 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.8 28-Apr-2018  maxv Inline M_EXT_WRITABLE directly, and remove the XXX, there's nothing wrong
in the use of !M_READONLY.
 1.7 28-Feb-2018  maxv branches: 1.7.2;
Remove unused macros, and while here style.
 1.6 28-Feb-2018  maxv Remove duplicate IPSEC_STATINC(IPSEC_STAT_IN_POLVIO), ipsec_in_reject
already increases it. IPSEC6_STATINC is now unused, so remove it too.
 1.5 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.4 19-Apr-2017  ozaki-r branches: 1.4.4;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.3 28-Apr-2008  martin branches: 1.3.4; 1.3.6; 1.3.48; 1.3.68; 1.3.72; 1.3.76;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
PF_KEY stats for IPSEC and FAST_IPSEC are now per-CPU.
 1.1 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.76.1 21-Apr-2017  bouyer Sync with HEAD
 1.3.72.1 26-Apr-2017  pgoyette Sync with HEAD
 1.3.68.1 28-Aug-2017  skrll Sync with HEAD
 1.3.48.1 03-Dec-2017  jdolecek update from HEAD
 1.3.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.6.1 28-Apr-2008  mjf file ipsec_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:28 +0000
 1.3.4.2 18-May-2008  yamt sync with head.
 1.3.4.1 28-Apr-2008  yamt file ipsec_private.h was added on branch yamt-pf42 on 2008-05-18 12:35:40 +0000
 1.4.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.7.2.1 02-May-2018  pgoyette Synch with HEAD
 1.8 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.7 28-Apr-2018  maxv branches: 1.7.2;
Remove unused macros.
 1.6 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.5 23-Apr-2008  thorpej branches: 1.5.88;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.4 10-Dec-2005  elad branches: 1.4.70; 1.4.72;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.3 26-Feb-2005  perry branches: 1.3.4;
nuke trailing whitespace
 1.2 17-Jul-2004  atatat branches: 1.2.2; 1.2.6; 1.2.8;
Rework sys/netipsec/ipsec_netbsd.c to present a more consistent tree.

Rework usr.bin/netstat/fast_ipsec.c to find the stats nodes under the
new names (Kame uses the name stats so we use different ones), as well
as setting slen appropriately between calls to sysctlbyname(), and
providing forward compatibility when actually retrieving stats via
sysctlbyname().

And correct a spelling error.
 1.1 07-May-2004  jonathan branches: 1.1.2;
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1.2.3 17-Jul-2004  he Pull up revision 1.2 (requested by atatat in ticket #674):
Rework ipsec_netbsd.c to present a more ocnsistent tree.
Rework netstat to find the stats nodes under the new names.
 1.1.2.2 10-May-2004  tron Pull up revision 1.1 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.1.2.1 07-May-2004  tron file ipsec_var.h was added on branch netbsd-2-0 on 2004-05-10 15:00:38 +0000
 1.2.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.2.6.1 29-Apr-2005  kent sync with -current
 1.2.2.6 11-Dec-2005  christos Sync with head.
 1.2.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.2.2.1 17-Jul-2004  skrll file ipsec_var.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.3.4.1 21-Jun-2006  yamt sync with head.
 1.4.72.1 18-May-2008  yamt sync with head.
 1.4.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.88.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.5.88.2 02-May-2018  pgoyette Synch with HEAD
 1.5.88.1 22-Apr-2018  pgoyette Sync with HEAD
 1.7.2.1 10-Jun-2019  christos Sync with HEAD
 1.24 11-Jun-2025  ozaki-r in: get rid of unused argument from ip_newid() and ip_newid_range()
 1.23 19-May-2025  andvar spelling and grammar fixes in comments.
 1.22 01-Sep-2023  andvar branches: 1.22.6;
fix typos in comments, mainly s/innner/inner/.
 1.21 08-Dec-2022  knakahara branches: 1.21.2;
Fix: update lastused of ipsecif(4) IPv6 out SP.
 1.20 07-Dec-2022  knakahara gif(4), ipsec(4) and l2tp(4) use encap_attach_addr().
 1.19 31-Jan-2020  knakahara Fix IPv6 over IPv4 ipsecif(4) uses IPv4 SP wrongly. Pointed out by ohishi@IIJ.

XXX pullup-8, pullup-9
 1.18 01-Nov-2019  knakahara branches: 1.18.2;
Make global and per-interface ipsecif(4) pmtu tunable like gif(4).

And make hop limit tunable same as gif(4).

See http://mail-index.netbsd.org/source-changes/2019/10/30/msg110426.html
 1.17 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.16 17-May-2019  knakahara branches: 1.16.2;
Don't clear calculated Tx tos value for IPv[46] over IPv6.
 1.15 12-Apr-2019  knakahara remove a variable which is no longer used.
 1.14 18-Mar-2019  msaitoh s/pakcet/packet/ in comment.
 1.13 26-Dec-2018  knakahara ipsecif(4) supports multiple peers in the same NAPT.

E.g. ipsec0 connects between NetBSD_A and NetBSD_B, ipsec1 connects
NetBSD_A and NetBSD_C at the following figure.

+----------+
+----| NetBSD_B |
+----------+ +------+ | +----------+
| NetBSD_A |--- ... ---| NAPT |---+
+----------+ +------+ | +----------+
+----| NetBSD_C |
+----------+

Add ATF later.
 1.12 07-Dec-2018  knakahara ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
 1.11 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.10 31-May-2018  maxv branches: 1.10.2;
Remove support for non-IKE markers in the kernel. Discussed on tech-net@,
and now in PR/53334. Basically non-IKE markers come from a deprecated
draft, and our kernel code for them has never worked.

Setsockopt will now reject UDP_ENCAP_ESPINUDP_NON_IKE.

Perhaps we should also add a check in key_handle_natt_info(), to make
sure we also reject UDP_ENCAP_ESPINUDP_NON_IKE in the SADB.
 1.9 09-May-2018  maxv static const on ipsecif4_encapsw
 1.8 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.7 06-Apr-2018  knakahara Fix unexpected failure when ipsecif(4) over IPv6 is changed port number only.

Here is an example of the operation which causes this problem.
# ifconfig ipsec0 create link0
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4501
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4502
 1.6 06-Apr-2018  knakahara Add IPv4 ID when the ipsecif(4) packet can be fragmented. Implemented by hsuenaga@IIJ and ohishi@IIJ, thanks.

This modification reduces packet loss of fragmented packets on a
network where reordering occurs.

Alghough this modification has been applied, IPv4 ID is not set for
the packet smaller then IP_MINFRAGSIZE. According to RFC 6864, that
must not cause problems.

XXX pullup-8
 1.5 13-Mar-2018  knakahara comment out confusing (and incorrect) code and add comment. Pointed out by maxv@n.o, thanks.
 1.4 09-Mar-2018  knakahara Fix ipsec(4) I/F esp_frag support.
 1.3 06-Mar-2018  knakahara Fix fragment processing in ipsec4_fragout(). Pointed out by maxv@n.o, thanks.

XXX need pullup-8
 1.2 26-Feb-2018  maxv branches: 1.2.2;
Merge some minor (mostly stylistic) changes from last week.
 1.1 10-Jan-2018  knakahara branches: 1.1.2;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.1.2.10 31-Jan-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1497):

sys/netipsec/ipsecif.c: revision 1.19

Fix IPv6 over IPv4 ipsecif(4) uses IPv4 SP wrongly. Pointed out by ohishi@IIJ.
XXX pullup-8, pullup-9
 1.1.2.9 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.1.2.8 29-May-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1273):

sys/netipsec/ipsecif.c: revision 1.16

Don't clear calculated Tx tos value for IPv[46] over IPv6.
 1.1.2.7 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.1.2.6 09-Apr-2018  martin Pull up following revision(s) (requested by knakahara in ticket #714):

sys/net/if_ipsec.c: revision 1.8 - 1.11
sys/netipsec/ipsecif.h: revision 1.2
sys/netipsec/ipsecif.c: revision 1.6,1.7

fix ipsec(4) encap_lock leak.

fix ipsecif(4) unmatch curlwp_bind.

fix ipsecif(4) stack overflow.

Add IPv4 ID when the ipsecif(4) packet can be fragmented. Implemented by hsuenaga@IIJ and ohishi@IIJ, thanks.
This modification reduces packet loss of fragmented packets on a
network where reordering occurs.

Alghough this modification has been applied, IPv4 ID is not set for
the packet smaller then IP_MINFRAGSIZE. According to RFC 6864, that
must not cause problems.

Fix unexpected failure when ipsecif(4) over IPv6 is changed port number only.
Here is an example of the operation which causes this problem.
# ifconfig ipsec0 create link0
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4501
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4502
 1.1.2.5 13-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #627):
sys/netipsec/ipsecif.c: revision 1.5
tests/net/if_ipsec/t_ipsec.sh: revision 1.4
sys/net/if_ipsec.c: revision 1.7
Fix IPv6 ipsecif(4) ATF regression, sorry.
There must *not* be padding between the src sockaddr and the dst sockaddr
after struct sadb_x_policy.

Comment out confusing (and incorrect) code and add comment. Pointed out by maxv@n.o, thanks.

Enhance assertion ipsecif(4) ATF to avoid confusing setkey(8) error message.

When setkey(8) says "syntax error at [-E]", it must mean get_if_ipsec_unique()
failed.
 1.1.2.4 13-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #620):
sys/netipsec/ipsecif.c: revision 1.4
sys/net/if_ipsec.c: revision 1.4
sys/net/if_ipsec.c: revision 1.5
sys/net/if_ipsec.c: revision 1.6
NAT-T src and dst port in ipsec_variant should be network byte order.
Fix missing sadb_x_ipsecrequest informations for PF_KEY message.
Functionalize duplicated code. No functional changes.
Fix ipsec(4) I/F esp_frag support.
 1.1.2.3 06-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #607):
sys/netipsec/ipsecif.c: revision 1.3
Fix fragment processing in ipsec4_fragout(). Pointed out by maxv@n.o, thanks.
XXX need pullup-8
 1.1.2.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.1.2.1 10-Jan-2018  snj file ipsecif.c was added on branch netbsd-8 on 2018-02-11 21:17:34 +0000
 1.2.2.8 18-Jan-2019  pgoyette Synch with HEAD
 1.2.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.2.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.2.2.5 25-Jun-2018  pgoyette Sync with HEAD
 1.2.2.4 21-May-2018  pgoyette Sync with HEAD
 1.2.2.3 02-May-2018  pgoyette Synch with HEAD
 1.2.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.2.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.10.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.10.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.10.2.1 10-Jun-2019  christos Sync with HEAD
 1.16.2.2 31-Jan-2020  martin Pull up following revision(s) (requested by knakahara in ticket #679):

sys/netipsec/ipsecif.c: revision 1.19

Fix IPv6 over IPv4 ipsecif(4) uses IPv4 SP wrongly. Pointed out by ohishi@IIJ.
XXX pullup-8, pullup-9
 1.16.2.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.18.2.1 29-Feb-2020  ad Sync with head.
 1.21.2.1 29-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1140):

sys/netinet/ip_output.c: revision 1.330
sys/netinet/sctp_output.c: revision 1.39
sys/netinet/ip_mroute.c: revision 1.166
sys/netipsec/ipsecif.c: revision 1.24
sys/netipsec/xform_ipip.c: revision 1.80
sys/netinet/ip_output.c: revision 1.327
sys/netinet/ip_output.c: revision 1.328
sys/netinet/ip_input.c: revision 1.406
sys/netinet/ip_output.c: revision 1.329
sys/netinet/in_var.h: revision 1.105

in: get rid of unused argument from ip_newid() and ip_newid_range()

in: take a reference of ifp on IP_ROUTETOIF
The ifp could be released after ia4_release(ia).

in: narrow the scope of ifa in ip_output (NFC)

sctp: follow the recent change of ip_newid()

in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()
If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.
PR kern/59527

in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()
Same as the case of ip_output(), it's racy and should be avoided.
PR kern/59527
 1.22.6.1 02-Aug-2025  perseant Sync with HEAD
 1.3 01-Nov-2019  knakahara Make global and per-interface ipsecif(4) pmtu tunable like gif(4).

And make hop limit tunable same as gif(4).

See http://mail-index.netbsd.org/source-changes/2019/10/30/msg110426.html
 1.2 06-Apr-2018  knakahara branches: 1.2.2;
Fix unexpected failure when ipsecif(4) over IPv6 is changed port number only.

Here is an example of the operation which causes this problem.
# ifconfig ipsec0 create link0
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4501
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4502
 1.1 10-Jan-2018  knakahara branches: 1.1.2; 1.1.4;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.1.4.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.1.2.3 09-Apr-2018  martin Pull up following revision(s) (requested by knakahara in ticket #714):

sys/net/if_ipsec.c: revision 1.8 - 1.11
sys/netipsec/ipsecif.h: revision 1.2
sys/netipsec/ipsecif.c: revision 1.6,1.7

fix ipsec(4) encap_lock leak.

fix ipsecif(4) unmatch curlwp_bind.

fix ipsecif(4) stack overflow.

Add IPv4 ID when the ipsecif(4) packet can be fragmented. Implemented by hsuenaga@IIJ and ohishi@IIJ, thanks.
This modification reduces packet loss of fragmented packets on a
network where reordering occurs.

Alghough this modification has been applied, IPv4 ID is not set for
the packet smaller then IP_MINFRAGSIZE. According to RFC 6864, that
must not cause problems.

Fix unexpected failure when ipsecif(4) over IPv6 is changed port number only.
Here is an example of the operation which causes this problem.
# ifconfig ipsec0 create link0
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4501
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4502
 1.1.2.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.1.2.1 10-Jan-2018  snj file ipsecif.h was added on branch netbsd-8 on 2018-02-11 21:17:34 +0000
 1.2.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.285 02-Sep-2024  andvar s/timehander/timehandler/ in the comment.
 1.284 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.283 29-Jun-2024  riastradh branches: 1.283.2;
netipsec: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.282 10-Aug-2023  andvar fix typos in comments s/iton/tion/ or s/ton/tion/.
 1.281 21-Jul-2023  knakahara Use kmem_free instead of kmem_intr_free, as key_freesaval() is not called in softint after key.c:r1.223.

E.g. key_freesaval() was called the following call path before SAD MP-ify.
esp_input_cb()
KEY_FREESAV()
key_freesav()
key_delsav()
key_freesaval()

ok'ed by ozaki-r@n.o.
 1.280 08-Dec-2022  knakahara branches: 1.280.2;
Fix: sp->lastused should be updated by time_uptime, and refactor a little.
 1.279 08-Dec-2022  knakahara Fix: update lastused of ipsecif(4) IPv6 out SP.
 1.278 19-Oct-2022  christos PR/56836: Andrew Cagney: IPv6 ESN tunneling IPcomp has corrupt header

Always always send / expect CPI in IPcomp header

Fixes kern/56836 where an IPsec interop combining compression and
ESP|AH would fail.

Since fast ipsec, the outgoing IPcomp header has contained the
compression algorithm instead of the CPI. Adding the
SADB_X_EXT_RAWCPI flag worked around this but ...

The IPcomp's SADB was unconditionally hashed using the compression
algorithm instead of the CPI. This meant that an incoming packet with
a valid CPI could never match its SADB.
 1.277 11-Oct-2022  knakahara Add sadb_x_policy_flags to inform SP origination.

This extension(struct sadb_x_policy) is *not* defined by RFC2367.

OpenBSD does not have reserved fields in struct sadb_x_policy.
Linux does not use this field yet.
FreeBSD uses this field as "sadb_x_policy_scope"; the value range is
from 0x00 to 0x04.

We use from most significant bit to avoid the above usage.
 1.276 09-Aug-2022  knakahara Add sysctl entry to improve interconnectivity to some VPN appliances, pointed out by seil-team@IIJ.

If we want to allow different identifier types on IDii and IDir, set
net.key.allow_different_idtype=1. Default(=0) is the same as before.
 1.275 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.274 18-May-2022  christos PR/56841: Andrew Cagney: debug-log IPcomp CPI lookups:
- debug-logs why an SPI is rejected
- adds missing __VA_OPT__(,) to some printf macros
- debug-log SPI+proto when adding/updating entry
 1.273 02-Jan-2022  andvar fix few more typos in comments.
 1.272 03-Dec-2021  andvar fix various typos in comments, log messages and documentation.
 1.271 13-Mar-2020  knakahara Fix kern/55066. Pointed out and fixed by Chuck Zmudzinski, thanks.

ok'ed by ozaki-r@n.o
 1.270 07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.269 14-Nov-2019  knakahara branches: 1.269.2;
Reduce load for IKE negotiations when the system has many IPv6 addresses.

e.g. the system has many vlan(4), gif(4) or ipsecif(4) with link local address.
 1.268 12-Nov-2019  knakahara Fix SA can be expaired wrongly when there are many SPs.

When key_timehandler_spd() spent over one second, the "now" argument of
key_timehandler_sad() could be older than sav->created. That caused SA
was expired immediately.
 1.267 25-Sep-2019  ozaki-r Make panic messages more informative
 1.266 04-Aug-2019  maxv Fix info leaks.
 1.265 23-Jul-2019  ozaki-r branches: 1.265.2;
ipsec: fix a regression of the update API

The update API updates an SA by creating a new SA and removing an existing SA.
The previous change removed a newly added SA wrongly if an existing SA had been
created by the getspi API.
 1.264 17-Jul-2019  ozaki-r Avoid a race condition between SA (sav) manipulations

An sav can be removed from belonging list(s) twice resulting in an assertion
failure of pslist. It can occur if the following two operations interleave:
(i) a deletion or a update of an SA via the API, and
(ii) a state change (key_sa_chgstate) of the same SA by the timer.
Note that even (ii) removes an sav once from its list(s) on a update.

The cause of the race condition is that the two operations are not serialized
and (i) doesn't get and remove an sav from belonging list(s) atomically. So
(ii) can be inserted between an acquisition and a removal of (i).

Avoid the race condition by making (i) atomic.
 1.263 12-Jun-2019  christos fix typo in comment, improve error message, add default case handling to
set error.
 1.262 12-Jun-2019  christos Fix double free: key_setsaval() free's newsav by calling key_freesaval()
and key_api_update() calls key_delsav() when key_setsaval() fails which
calls key_freesaval() again...
 1.261 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.260 26-Dec-2018  knakahara ipsecif(4) supports multiple peers in the same NAPT.

E.g. ipsec0 connects between NetBSD_A and NetBSD_B, ipsec1 connects
NetBSD_A and NetBSD_C at the following figure.

+----------+
+----| NetBSD_B |
+----------+ +------+ | +----------+
| NetBSD_A |--- ... ---| NAPT |---+
+----------+ +------+ | +----------+
+----| NetBSD_C |
+----------+

Add ATF later.
 1.259 26-Dec-2018  knakahara Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.258 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.257 23-Aug-2018  ozaki-r Don't call key_ismyaddr, which may sleep, in a pserialize read section

Use mutex here instead of pserialize because using mutex is simpler than
using psz+ref, which is another solution, and key_checkspidup isn't called in
any performance-sensitive paths.
 1.256 04-Jul-2018  christos merge duplicated code, more informative debugging.
 1.255 28-Apr-2018  maxv branches: 1.255.2;
Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.254 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.253 17-Apr-2018  yamaguchi Fix panic of SADB when the state of sav is changed in timeout

pointed out by ozaki-r@n.o, thanks
 1.252 16-Apr-2018  yamaguchi Added a lookup table to find an sav quickly

key_sad.sahlists doesn't work well for inbound packets because
its key includes source address. For the reason, the
look-up-table for the inbound packets is newly added.
The table has all sav whose state is MATURE or DYING and uses a
key calculated by destination address, protocol, and spi instead
of saidx.

reviewd ozaki-r@n.o, thanks.
 1.251 16-Apr-2018  yamaguchi Introduced a hash table to sahlist

An saidx of sah included in the list is unique so that
the search can use a hash list whose hash is calculated by
the saidx to find an sah quickly.
The hash list of the sahlits is used in FreeBSD, too.

reviewed by ozaki-r@n.o, thanks.
 1.250 09-Apr-2018  yamaguchi Removed the unnecessary order check of key_lookup_sa

key_prefered_oldsa flag can change the sa to use if an sah
has multiple sav. However the multiple saves whose protocol
is ah, esp, or tcp cannot exist because their duplications
are checked by the spi value. Although the multiple saves
can exist in the case of ipcomp, the values using in the
post processing are same between the saves.

For those reasons, it is no need to select an sav by its
lifetime.
In addition, FreeBSD has already remove this.

reviewed by ozaki-r@n.o, thanks.
 1.249 02-Mar-2018  ozaki-r branches: 1.249.2;
Avoid data races on lifetime counters by using percpu(9)

We don't make them percpu(9) directly because the structure is exposed to
userland and we don't want to break ABI. So we add another member variable
for percpu(9) and use it internally. When we export them to userland, they
are converted to the original format.
 1.248 08-Feb-2018  maxv Remove unused net_osdep.h include.
 1.247 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.246 01-Dec-2017  ozaki-r Don't touch an SP without a reference to it
 1.245 30-Nov-2017  ozaki-r Fix a deadlock happening if !NET_MPSAFE

If NET_MPSAFE isn't set, key_timehandler_work is executed with holding
softnet_lock. This means that localcount_drain can be called with holding
softnet_lock resulting in a deadlock that localcount_drain waits for packet
processing to release a reference to SP/SA while network processing is prevented
by softnet_lock.

Fix the deadlock by not taking softnet_lock in key_timehandler_work. It's okay
because IPsec is MP-safe even if !NET_MPSAFE. Note that the change also needs
to enable pserialize_perform because the IPsec code can be run in parallel now.

Reported by christos@
 1.244 30-Nov-2017  ozaki-r Use KDASSERT for mutex_ownable

Because mutex_ownable is not cheap.
 1.243 22-Nov-2017  ozaki-r Fix usage of FOREACH macro

key_sad.lock is held there so SAVLIST_WRITER_FOREACH is enough.
 1.242 21-Nov-2017  ozaki-r Call key_sendup_mbuf immediately unless key_acquire is called in softint

We need to defer it only if it's called in softint to avoid deadlock.
 1.241 21-Nov-2017  ozaki-r Simply the code by avoiding unnecessary error checks

- Remove unnecessary m_pullup for self-allocated mbufs
- Replace some if-fails-return sanity checks with KASSERT
 1.240 21-Nov-2017  ozaki-r Get rid of unnecessary NULL checks that are obsoleted by M_WAITOK
 1.239 21-Nov-2017  ozaki-r Use M_WAITOK to allocate mbufs wherever sleepable

Further changes will get rid of unnecessary NULL checks then.
 1.238 21-Nov-2017  ozaki-r Add missing splx to key_spdexpire
 1.237 21-Nov-2017  ozaki-r Fix error handling of MCLGET in key_alloc_mbuf
 1.236 21-Nov-2017  ozaki-r Provide a function to call MGETHDR and MCLGET

The change fixes two usages of MGETHDR that don't check whether a mbuf is really
allocated before passing it to MCLGET.
 1.235 08-Nov-2017  ozaki-r Mark key_timehandler_ch callout as MP-safe (just forgot to do so)
 1.234 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.233 03-Oct-2017  ozaki-r Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
 1.232 03-Oct-2017  ozaki-r Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
 1.231 01-Oct-2017  ryoon Fix typo in comment
 1.230 30-Sep-2017  christos cast reduction, NFC.
 1.229 29-Sep-2017  christos humanize printing of ip addresses
 1.228 28-Sep-2017  christos - sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
 1.227 27-Sep-2017  ozaki-r Add missing ifdef NET_MPSAFE
 1.226 27-Sep-2017  ozaki-r Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
 1.225 21-Aug-2017  knakahara remove unnecessary comment.
 1.224 21-Aug-2017  knakahara fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
 1.223 09-Aug-2017  ozaki-r MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
 1.222 09-Aug-2017  ozaki-r Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
 1.221 09-Aug-2017  ozaki-r Fix that prev isn't cleared on retry
 1.220 09-Aug-2017  ozaki-r Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
 1.219 09-Aug-2017  ozaki-r Fix locking notes of SAD
 1.218 08-Aug-2017  ozaki-r Destroy sav only in the loop for DEAD sav
 1.217 08-Aug-2017  ozaki-r Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
 1.216 08-Aug-2017  ozaki-r MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
 1.215 08-Aug-2017  ozaki-r Add missing mutex_exit
 1.214 08-Aug-2017  ozaki-r Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
 1.213 08-Aug-2017  ozaki-r Add __read_mostly to key_psz

Suggested by riastradh@
 1.212 07-Aug-2017  ozaki-r Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
 1.211 07-Aug-2017  ozaki-r Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
 1.210 07-Aug-2017  ozaki-r Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
 1.209 07-Aug-2017  ozaki-r Move locking notes
 1.208 07-Aug-2017  ozaki-r Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
 1.207 07-Aug-2017  ozaki-r Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
 1.206 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.205 03-Aug-2017  ozaki-r MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
 1.204 03-Aug-2017  ozaki-r Rename local variable newsah to sah

It may not be new.
 1.203 03-Aug-2017  ozaki-r Use pslist(9) for sah->savtree
 1.202 03-Aug-2017  ozaki-r Use pslist(9) for sahtree
 1.201 03-Aug-2017  ozaki-r Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
 1.200 02-Aug-2017  ozaki-r Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
 1.199 02-Aug-2017  ozaki-r Fix updating ipsec_used; turn on when SPs on sockets are added
 1.198 02-Aug-2017  ozaki-r Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
 1.197 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.196 27-Jul-2017  ozaki-r Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
 1.195 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.194 26-Jul-2017  ozaki-r Use pslist(9) for sptree
 1.193 26-Jul-2017  ozaki-r Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
 1.192 26-Jul-2017  ozaki-r Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
 1.191 21-Jul-2017  ozaki-r Remove ipsecrequest#sav
 1.190 21-Jul-2017  ozaki-r Stop setting isr->sav on looking up sav in key_checkrequest
 1.189 21-Jul-2017  ozaki-r Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
 1.188 18-Jul-2017  ozaki-r branches: 1.188.2;
Rename key_allocsa_policy to key_lookup_sa_bysaidx
 1.187 16-Jul-2017  ozaki-r Make sure to sort the list when changing the state by key_sa_chgstate
 1.186 16-Jul-2017  ozaki-r Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
 1.185 15-Jul-2017  christos fix printf format.
 1.184 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.183 14-Jul-2017  ozaki-r Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
 1.182 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.181 13-Jul-2017  ozaki-r Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
 1.180 12-Jul-2017  ozaki-r Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
 1.179 12-Jul-2017  ozaki-r Omit unnecessary NULL checks for sav->sah
 1.178 12-Jul-2017  ozaki-r Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
 1.177 12-Jul-2017  ozaki-r Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
 1.176 11-Jul-2017  ozaki-r Separate sending message routine (NFC)
 1.175 11-Jul-2017  ozaki-r Use time_mono_to_wall (NFC)
 1.174 11-Jul-2017  ozaki-r Let key_getsavbyspi take a reference of a returning sav
 1.173 11-Jul-2017  ozaki-r Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
 1.172 10-Jul-2017  ozaki-r Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
 1.171 10-Jul-2017  ozaki-r Make sure a sav is inserted to a sah list after its initialization completes
 1.170 10-Jul-2017  ozaki-r Add missing KEY_FREESAV
 1.169 10-Jul-2017  ozaki-r Make sure to clear keys on error paths of key_setsaval
 1.168 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.167 06-Jul-2017  ozaki-r Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
 1.166 06-Jul-2017  ozaki-r Fix usages of sadb_msg_errno
 1.165 04-Jul-2017  ozaki-r Introduce and use SADB_SASTATE_USABLE_P
 1.164 10-Jun-2017  ozaki-r Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
 1.163 02-Jun-2017  ozaki-r branches: 1.163.2;
Tweak header file inclusions
 1.162 02-Jun-2017  ozaki-r Change the prefix of function names of SADB API handlers to key_api_

By doing so we can easily distinguish them from other utility functions.
And so we can easily know that they are all called from key_parse and
applied assumptions that the arguments are always non-NULL and they
are always called from userland, i.e., never called from interrupt
context (softint). As a result, we can omit some tedious assertions
in the functions.
 1.161 01-Jun-2017  ozaki-r Simplify; we can assume the arguments are always non-NULL
 1.160 01-Jun-2017  ozaki-r Return a return value of key_senderror as usual
 1.159 31-May-2017  ozaki-r Split the timer handler into small functions (NFC)
 1.158 31-May-2017  ozaki-r Introduce key_fill_replymsg to dedup some routines
 1.157 31-May-2017  ozaki-r Convert some sanity checks to CTASSERT
 1.156 31-May-2017  ozaki-r Move key_init_spidx_bymsghdr to just before spidx is used (NFC)
 1.155 31-May-2017  ozaki-r Use key_getsah more (NFCI)
 1.154 31-May-2017  ozaki-r Avoid using variable newsp for an existing SP (NFC)
 1.153 31-May-2017  ozaki-r Simplify; assignment just works for spidx (NFC)
 1.152 31-May-2017  ozaki-r Sanity-check and return on error early

And delay initializing local variables until they're actually used.
 1.151 31-May-2017  ozaki-r Hide details of the sadb message format (NFCI)

Especially src0 + 1 and dst0 + 1 shouldn't be exposed.
 1.150 30-May-2017  ozaki-r Use key_senderror
 1.149 30-May-2017  ozaki-r Send up an error message on error as well as others
 1.148 30-May-2017  ozaki-r Make refcnt operations of SA and SP atomic

Using atomic opeartions isn't optimal and should be optimized somehow
in the future though, the change allows a kernel with NET_MPSAFE to
run out a benchmark, which is useful to know performance improvement
and degradation by code changes.
 1.147 29-May-2017  ozaki-r Don't make isr->sav NULL

We assume it's always non-NULL.
 1.146 28-May-2017  mlelstv release key_mtx on return path.
 1.145 26-May-2017  ozaki-r Replace "cmp" of function names to "match" and make their return value consistent

Function names with cmp are expected to return the order of two comparees
like memcmp and strcmp. The functions in question just answer if matched
or not, so don't use cmp and use match instead.

Consistently return 1 on matched and 0 otherwise.
 1.144 26-May-2017  ozaki-r Make key_cmpspidx_exactly and key_cmpspidx_withmask static
 1.143 26-May-2017  ozaki-r Comment out unused key_freesp_so and key_freeso
 1.142 26-May-2017  ozaki-r Simplify; we can assume the arguments are always non-NULL
 1.141 25-May-2017  ozaki-r Protect acqtree and regtree with a mutex (key_mtx)

The data structures aren't used in any performance-sensitive paths,
so just using a mutex to protect them is good enough.
 1.140 23-May-2017  ozaki-r Use __arraycount (NFC)
 1.139 23-May-2017  ozaki-r Disable secspacq stuffs that are now unused

The stuffs are used only if sp->policy == IPSEC_POLICY_IPSEC
&& sp->req == NULL (see ipsec{4,6}_checkpolicy). However, in the
current implementation, sp->req never be NULL (except for the
moments of SP allocation and deallocation) if sp->policy is
IPSEC_POLICY_IPSEC.

It seems that the facility was partially implemented in the KAME
era and wasn't completed. Make it clear that the facility is
unused for now by #ifdef notyet. Eventually we should complete
the implementation or remove it entirely.
 1.138 23-May-2017  ozaki-r Prepare to retire __LIST_CHAINED

We shouldn't relpy on the band-aid and instead use a lock or
refcnt to maintain chains properly. Before removing them,
replace conditionals with KASSERTs and see what will happen.
 1.137 22-May-2017  ozaki-r KNF

And avoid calling a function, assigning a result to a variable, and
comparing it all together in one condition expression.
 1.136 22-May-2017  ozaki-r Replace remaining DPRINTF with IPSECLOG
 1.135 19-May-2017  ozaki-r Remove unnecessary MALLOC_DEFINE(M_SECA)
 1.134 19-May-2017  ozaki-r Use IPSECLOG instead of ipseclog
 1.133 19-May-2017  ozaki-r Use kmem_intr_free in key_freesaval which can be called in softint
 1.132 17-May-2017  ozaki-r Replace malloc/free with kmem(9) and kill KMALLOC/KFREE macros
 1.131 17-May-2017  ozaki-r Fix memory leaks of allocated data to sav on key_update

key_setsaval NULL-clears member variables of sav at the beginning
of the function regardless of the states of the variables. When
key_setsaval is called by key_update, member variables sav->replay,
sav->key_* and sav->lft_* may have data allocated by malloc. In
that case they will leak. Free them before NULL-clear to avoid
memory leaks.
 1.130 16-May-2017  ozaki-r Replace kmem_alloc + memset with kmem_zalloc

Suggested by kamil@
 1.129 16-May-2017  ozaki-r Fix memory leaks of sah->idents and sah->identd

Originally fixed by the SEIL team of IIJ
 1.128 16-May-2017  ozaki-r Fix diagnostic assertion failure in ipsec_init_policy

panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file "../../../../netipsec/ipsec.c", line 1277
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
ipsec_init_policy() at netbsd:ipsec_init_policy+0x149
in_pcballoc() at netbsd:in_pcballoc+0x1c5
tcp_attach_wrapper() at netbsd:tcp_attach_wrapper+0x1e1
sonewconn() at netbsd:sonewconn+0x1ea
syn_cache_get() at netbsd:syn_cache_get+0x15f
tcp_input() at netbsd:tcp_input+0x1689
ipintr() at netbsd:ipintr+0xa88
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe811d337ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f

Reported by msaitoh@
 1.127 16-May-2017  ozaki-r Use kmem(9) instead of malloc/free

Some of non-sleepable allocations can be replaced with sleepable ones.
To make it clear that the replacements are possible, some assertions
are addded.
 1.126 16-May-2017  ozaki-r Run key_timehandler in thread context (workqueue)

The handler involves object deallocations so we want to not run
it in softint.
 1.125 15-May-2017  ozaki-r Show __func__ instead of __FILE__ in debug log messages

__func__ is shorter and more useful than __FILE__.
 1.124 15-May-2017  ozaki-r Fix a debug log message
 1.123 15-May-2017  ozaki-r Kill useless IPSEC_DEBUG2 (NFC)
 1.122 09-May-2017  ozaki-r Fix kernel build with IPSEC
 1.121 09-May-2017  ozaki-r Add debugging facilities for refcnt of SA/SP
 1.120 09-May-2017  ozaki-r Provide foreach macros for SA states (NFCI)
 1.119 09-May-2017  ozaki-r Use LIST_* functions (NFC)
 1.118 26-Apr-2017  ozaki-r branches: 1.118.2;
Replace leading whitespaces with tabs and tweak some indentations
 1.117 26-Apr-2017  ozaki-r Remove unnecessary LIST_FOREACH definition
 1.116 20-Apr-2017  ozaki-r Use IPSEC_DIR_IS_INOROUT (NFC)
 1.115 20-Apr-2017  ozaki-r Provide IPSEC_DIR_* validation macros
 1.114 19-Apr-2017  ozaki-r Reduce return points (NFC)
 1.113 19-Apr-2017  ozaki-r Return early, reduce identation (NFCI)
 1.112 19-Apr-2017  ozaki-r Use KASSERT for sanity checks of function arguments
 1.111 19-Apr-2017  ozaki-r Tweak KEYDEBUG macros

Let's avoid passing statements to a macro.
 1.110 19-Apr-2017  ozaki-r Change panic if DIAGNOSTIC to KASSERT

One can be changed to CTASSERT.
 1.109 19-Apr-2017  ozaki-r Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.108 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.107 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.106 17-Apr-2017  ozaki-r Accept AH with NULL algorithm of zero-length key
 1.105 10-Apr-2017  ozaki-r Fix assertion failure in in6_lookup_multi via key_ismyaddr

in6_lookup_multi was forgotten to be migrated to in6_multi_group.
Also psz should be changed to psz/psref because in6_multi_group is
sleepable.

Fix PR kern/52151
 1.104 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.103 23-Feb-2017  ozaki-r Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.102 22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.101 20-Jul-2016  ozaki-r branches: 1.101.2;
Apply pserialize to some iterations of IP address lists
 1.100 07-Jul-2016  ozaki-r branches: 1.100.2;
Restore const qualifier dropped due to switching to IN_ADDRLIST_READER_FOREACH

IN_ADDRLIST_READER_FOREACH (pslist(9)) now allows const variables.
 1.99 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.98 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.97 07-Mar-2016  christos PR/50905: Henning Petersen: Fix useless comparison (from FreeBSD)
 1.96 06-Mar-2016  christos Simplify the port comparison code further.
 1.95 05-Mar-2016  christos kill stray &
 1.94 05-Mar-2016  christos Fix port matching; we need to ignore ports when they are 0 not only in
the second saidx but the first one too. Fixes NAT-T issue with NetBSD
being the host behind NAT.
 1.93 05-Mar-2016  christos gather more information from mbuf.
 1.92 05-Mar-2016  christos Add more debugging, no functional change.
 1.91 16-Jun-2014  christos branches: 1.91.2; 1.91.4;
cleanup debugging printfs and fix port endianness printing issue.
 1.90 05-Jun-2014  christos CID 1220169: Reverse NULL
 1.89 05-Jun-2014  christos CID 274353: Forward NULL
 1.88 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.87 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.86 01-Mar-2014  joerg branches: 1.86.2;
Remove modification of an unused uninitialized variable.
 1.85 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.84 03-Nov-2013  mrg - apply some __diagused
- remove unused variables
- move some variables inside their relevant use #ifdef
 1.83 19-Sep-2013  christos make debugging code use __func__
remove stray printf
 1.82 24-Jun-2013  riastradh branches: 1.82.2;
Replace consttime_bcmp/explicit_bzero by consttime_memequal/explicit_memset.

consttime_memequal is the same as the old consttime_bcmp.
explicit_memset is to memset as explicit_bzero was to bcmp.

Passes amd64 release and i386/ALL, but I'm sure I missed some spots,
so please let me know.
 1.81 05-Jun-2013  christos IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.80 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.79 20-Sep-2012  gdt Fix whitespace (mostly removing trailing).

This commit changes only whitespace (trailing, tabs vs spaces,
removing spurious newlines). From Bev Schwartz of BBN.
 1.78 30-Aug-2012  drochner branches: 1.78.2;
Add "consttime_bcmp" and "explicit_bzero" functions for both kernel
abd userland, as proposed on tech-security, with explicit_bzero using
a volatile function pointer as suggested by Alan Barrett.
Both do what the name says. For userland, both are prefixed by "__"
to keep them out of the user namespace.
Change some memset/memcmp uses to the new functions where it makes
sense -- these are just some examples, more to come.
 1.77 29-Aug-2012  drochner g/c unused struct member
 1.76 09-Jan-2012  drochner branches: 1.76.2; 1.76.4;
allow the ESP fragment length in the NAT-T case to be reported back
through the pfkey interface, kernel part of PR kern/44952
by Wolfgang Stukenbrock
 1.75 19-Dec-2011  drochner as in netkey/key.c, just use cprng_fast() to get a random number
(which is used to choose an SPI), kill the dummy seeding code
 1.74 17-Jul-2011  joerg branches: 1.74.2; 1.74.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.73 09-Jun-2011  drochner more "const"
 1.72 06-Jun-2011  drochner remove a limitation that inner and outer IP version must be equal
for an ESP tunnel, and add some fixes which make v4-in-v6 work
(v6 as inner protocol isn't ready, even v6-in-v6 can never have worked)

being here, fix a statistics counter and kill an unused variable
 1.71 23-May-2011  drochner branches: 1.71.2;
g/c remainders of IV handling in pfkey code -- this is done in
opencrypto now
 1.70 18-May-2011  drochner include the SHA2 hashs into the proposal which goes out with
SADB_ACQUIRE -- this doesn't change much because racoon ignores
the proposal from the kernel anyway and applies its own configuration,
but having MD5 and SHA1 in the list but SHA2 not looks strange
 1.69 18-May-2011  drochner use monotonic time rather than wall time for lifetime related timestamps,
to make key expiration robust against time changes
 1.68 17-May-2011  drochner cleanup some error handling to avoid memory leaks and doube frees,
from Wolfgang Stukenbrock per PR kern/44948, and part of kern/44952
 1.67 17-May-2011  drochner fix lookup of SAs for outgoing packets in the !prefered_oldsa case,
as done in KAME and FAST_IPSEC after NetBSD imported the code
(The default differs: KAME uses the oldest valid SA while FAST_IPSEC
in NetBSD uses the newest one. I'm not changing this -- there is a lack
of specification and behavior can be changed with the "oldsa" sysctl.)
For incoming packets it shouldn't matter but I made it look similar
just to avoid unnecessary differences.
 1.66 21-Feb-2011  drochner treat "struct secpolicyindex" and "struct secasindex" as "const" once
they are initialized -- during lifetime, no changes are expected
plus some constification of input to comparision functions etc
mostly required by the former
 1.65 18-Feb-2011  drochner more "const"
 1.64 05-Sep-2010  spz branches: 1.64.2; 1.64.4;
fix two bugs in the PFKEY interface:

1) RFC2367 says in 2.3.3 Address Extension: "All non-address
information in the sockaddrs, such as sin_zero for AF_INET sockaddrs,
and sin6_flowinfo for AF_INET6 sockaddrs, MUST be zeroed out."
the IPSEC_NAT_T code was expecting the port information it needs
to be conveyed in the sockaddr instead of exclusively by
SADB_X_EXT_NAT_T_SPORT and SADB_X_EXT_NAT_T_DPORT,
and was not zeroing out the port information in the non-nat-traversal
case.
Since it was expecting the port information to reside in the sockaddr
it could get away with (re)setting the ports after starting to use them.
-> Set the natt ports before setting the SA mature.

2) RFC3947 has two Original Address fields, initiator and responder,
so we need SADB_X_EXT_NAT_T_OAI and SADB_X_EXT_NAT_T_OAR and not just
SADB_X_EXT_NAT_T_OA

The change has been created using vanhu's patch for FreeBSD as reference.

Note that establishing actual nat-t sessions has not yet been tested.

Likely fixes the following:
PR bin/41757
PR net/42592
PR net/42606
 1.63 31-Jan-2010  hubertf branches: 1.63.2; 1.63.4;
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.62 18-Mar-2009  cegger bcmp -> memcmp
 1.61 18-Mar-2009  cegger Ansify function definitions w/o arguments. Generated with sed.
 1.60 14-Feb-2009  christos remove 2038 comment.
 1.59 09-Feb-2009  skd Back out my previous change. The problem I'm chasgin is with the
initialization of ports in saidx's when IPSEC_NAT_T is defined but the
association connection is not using nat traversal. Stay tuned.
 1.58 28-Jan-2009  skd branches: 1.58.2;
These comparison functions return 0 on match. Fix sense of test.
 1.57 25-Jul-2008  dsl branches: 1.57.2; 1.57.4; 1.57.10;
Comment out the 'do' and 'while (0)' from KEY_CHKSASTATE().
The expansion contains a 'continue' which is expected to continue
a loop in the callling code, not just abort the #define.
 1.56 01-Jul-2008  mlelstv branches: 1.56.2;
Ignore freed rtcache entries.
 1.55 04-May-2008  thorpej branches: 1.55.2; 1.55.4;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.54 03-May-2008  degroote In key_do_allocsa_policy, fix a bad usage of key_setsadbmsg. The third argument
is an SADB_SATYPE_*, not an IPPROTO_* .

Fix PR/38405. Thanks for the report
 1.53 24-Apr-2008  ad branches: 1.53.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.52 23-Apr-2008  thorpej PF_KEY stats for IPSEC and FAST_IPSEC are now per-CPU.
 1.51 07-Dec-2007  elad branches: 1.51.12; 1.51.14;
Let this code compile.

Hi, liamjfoy@. :)
 1.50 09-Jul-2007  ad branches: 1.50.6; 1.50.8; 1.50.14; 1.50.16;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.49 07-Jul-2007  degroote Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.48 27-Jun-2007  degroote Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.47 08-May-2007  degroote Increase the refcount for the default ipv6 policy so nobody can reclaim it
 1.46 11-Apr-2007  degroote When we construct an answer for SADB_X_SPDGET, don't use an hardcoded 0 for seq but
the seq used by the request. It will improve consistency with the answer of SADB_GET
request and helps some applications which relies both on seq and pid.

Reported by Karl Knutsson by pr/36119.
 1.45 11-Apr-2007  degroote In spddelete2, if we can't find the sp by this id, return after sending an error message,
don't process the following code with the NULL sp.

Spotted by Matthew Grooms on freebsd-net ML
 1.44 09-Apr-2007  degroote Fix a memleak in key_spdget.

Problem was reported by Karl Knutsson by pr/36119.
 1.43 21-Mar-2007  degroote Call key_checkspidup with spi in network bit order in order to make correct
comparaison with spi stored into the sadb.

Reported by Karl Knutsson in kern/36038 .
 1.42 09-Mar-2007  liamjfoy branches: 1.42.2; 1.42.4; 1.42.6;
Allow to build without INET6

Submitted by: Jukka Salmi
 1.41 07-Mar-2007  liamjfoy Add IPv6 Fast Forward:

Add call to ip6flow_invalidate_all()

ok christos, matt, dyoung and joerg
 1.40 04-Mar-2007  degroote Remove useless cast
Use NULL instead of (void*) 0
 1.39 04-Mar-2007  degroote Fix fallout from caddr_t changes
 1.38 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.37 18-Feb-2007  degroote Constify the code following the dyoung change ( the "bug" was hidden by the
extern declaration ).
While here, remove a Kame ifdef which is useless in netipsec code
 1.36 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.35 11-Feb-2007  degroote branches: 1.35.2;
fixed a unexpected addr/port matching failure in SA management
From cvs rev 1.127 of netkey/key.c
 1.34 11-Feb-2007  degroote reqid (for unique policy) is u_int16_t quantity.
from rev 1.125 of netkey/key.c
 1.33 10-Feb-2007  degroote Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.32 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.31 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.30 16-Nov-2006  christos branches: 1.30.2;
__unused removal on arguments; approved by core.
 1.29 13-Oct-2006  christos more __unused
 1.28 23-Jul-2006  ad branches: 1.28.4; 1.28.6;
Use the LWP cached credentials where sane.
 1.27 24-Dec-2005  perry branches: 1.27.4; 1.27.8;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.26 11-Dec-2005  christos merge ktrace-lwp.
 1.25 10-Jun-2005  christos branches: 1.25.2;
constify and unshadow.
 1.24 08-May-2005  christos Panic strings should not end with \n.
 1.23 28-Feb-2005  jonathan Repair references to nonexistent structs in sys/netipsec/key.c after
NAT-T changes. Matches changes to reference non-nonexistent structs in
sys/netkey.

I have no clue if this is correct, but it matches the style in
sys/netkey, and (unlike the previous two revisions) it actually compiles...
 1.22 26-Feb-2005  perry nuke trailing whitespace
 1.21 12-Feb-2005  manu Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.20 10-Jun-2004  jonathan branches: 1.20.2; 1.20.6; 1.20.8;
Fix oversight from re-using reworked sysctl() code for unicast SPD,SADB dump:
because the sysctl() code wasn't setting the requestor-pid field in dump
responses, the reworked unicast dump wasn't setting the requestor pid, either.
More exaclty, the pid field was set to 0.

No problem for setkey(8), but racoon reportedly ignores SADB dump-responses
with any pid (including 0) which doesn't match its own pid. A private bug
report says the 0-valued pid field broke racoon code which attempts to recover
from death of a prior racoon process, by dumping the SADB at startup.

Fix by revising sys/netipsec, so that both the new unicast PF_KEY dump
responses and the sysctl code set the requestor pid field in all
response mesages to DUMP requests.
 1.19 27-May-2004  jonathan Rework to make FAST_IPSEC PF_KEY dumps unicast and reliable:

Introduce new socket-layer function sbappendaddrchain() to
sys/kern/uipc_socket2.c: like sbappendaddr(), only takes a chain of
records and appends the entire chain in one pass. sbappendaddrchain()
also takes an `sbprio' argument, which indicates the caller requires
special `reliable' handling of the socket-buffer. `sbprio' is
described in sys/sys/socketvar.h, although (for now) the different
levels are not yet implemented.

Rework sys/netipsec/key.c PF_KEY DUMP responses to build a chain of
mbuf records, one record per dump response. Unicast the entire chain
to the requestor, with all-or-none semantics.

Changed files;
sys/socketvar.h kern/uipc_socket2.c netipsec/key.c
Reviewed by:
Jason Thorpe, Thor Lancelot Simon, post to tech-kern.

Todo: request pullup to 2.0 branch. Post-2.0, rework sysctl() API for
dumps to use new record-chain constructors. Actually implement
the distinct service levels in sbappendaddrchain() so we can use them
to make PF_KEY ACQUIRE messages more reliable.
 1.18 26-May-2004  jonathan Fix bugs in SPD refcounts due to PCBpolicy cache, by backporting the
KAME sys/netkey/key.c rev 1.119 ke_sp_unlink()/key_sp_dead() logic.

I have been running a similar version for about 10 days now, and it
fixes the PCB-cache refcount problems for me.

Checked in as a candidate for pullup to the 2.0 branch.
 1.17 26-May-2004  jonathan Thanks to Andrew Brown for the heads-up that fast_ipsec still had
key_prefered_oldsa, defaulted to 1 (on): preferring old SAs, based on
the ill-concieved Jenkins I-D, is broken by design. For now, just
turn it off, as the simplest way to fix this in the 2.0 branch.

Next step is to rip it out entirely: it was always a bad idea.
 1.16 25-May-2004  atatat The FAST_IPSEC code actually supports KEYCTL_PREFERED_OLDSA, so export
it via sysctl.
 1.15 30-Apr-2004  jonathan Fix for setkey(8) to dump SPD and SAdb via sysctl:

#1. Fix an off-by-one error in sysctl_net_key_dumpsa(), which was
passing sysctl argument name[1] to a helper. According to Andrew
Brown's revised dynamic sysctl schmea, it must instead pass name[0].

2. There is a naming glitch in using sysctl() for setkey(8): setkey
queries the same sysctl MIB numbers to dump IPsec database state,
irrepesctive of the underlying IPsec is KAME or FAST_IPSEC.
For this to work as expected, sys/netipsec must export net.key.dumpsa
and net.key.dumpsp via the identical MIB numbers used by sys/netkey.
``Make it so''. For now, renumber the sys/netipsec/key.c nodes;
post-2.0 we can use sysctl aliases.

3. For as-yet-unexplained reasons, the PF_KEY_V2 nodes are never
shown (or queried?) by sysctl(8). For 2.0, I am following an earlier
suggestion from Andrew Brown, and renumbering allthe FAST_IPSEC sysctl
nodes to appear under net.key at MIB number { CTL_NET, PF_KEY }. Since
the renumbering may change, the renumbering is done via a level of
indirection in the C preprocessor.

The nett result is that setkey(8) can find the nodes it needs for
setkey -D and setkey -PD: and that sysctl(8) finds all the FAST_IPSEC
sysctl nodes relatedy to IPsec keying, under net.key. Andrew Brown
has reviewed this patch and tentatively approved the changes, though
we may rework some of the changes in -current in the near future.
 1.14 27-Apr-2004  jonathan Update sys/netipsec/key.c to check for attempts to add IPv6-related
SPDs, and to warn about and reject any such attempts.

Addresses a security concern, that the (eas-yet incomplete, experimental)
FAST_IPSEC+INET6 does not honour IPv6 SPDs. The security risk is that
Naive users may not realize this, and their data may get leaked in
cleartext, rather than IPsec'ed, if they use IPv6.

Security issue raised by: Thor Lancelot Simon
reviewed and OKed by: Thor Lancelot Simon

2.0 Pullup request after: 24 hours for further public comment.
 1.13 26-Apr-2004  matt Remove #else of #if __STDC__
 1.12 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.11 24-Mar-2004  atatat branches: 1.11.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.10 17-Mar-2004  jonathan Fix key_ismyaddr6() multicast test, as per sys/netkey/key.c NetBSD rev 1.112.
 1.9 02-Mar-2004  thorpej Bring the PCB policy cache over from KAME IPsec, including the "hint"
used to short-circuit IPsec processing in other places.

This is enabled only for NetBSD at the moment; in order for it to function
correctly, ipsec_pcbconn() must be called as appropriate.
 1.8 01-Mar-2004  thorpej Merge netkey/key.c rev 1.51 (wiz):

va_{start,end} audit:
Make sure that each va_start has one and only one matching va_end,
especially in error cases.
If the va_list is used multiple times, do multiple va_starts/va_ends.
If a function gets va_list as argument, don't let it use va_end (since
it's the callers responsibility).

Improved by comments from enami and christos -- thanks!

Heimdal/krb4/KAME changes already fed back, rest to follow.

Inspired by, but not not based on, OpenBSD.
 1.7 24-Feb-2004  wiz occured -> occurred. From Peter Postma.
 1.6 12-Dec-2003  scw Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.
 1.5 12-Dec-2003  scw Add KEYCTL_DUMPSA/KEYCTL_DUMPSP support.
setkey(8)'s -D and -P options now work as expected with fast-ipsec.
 1.4 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.3 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.2 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.11.2.7 17-Jun-2004  tron Pull up revision 1.20 (requested by jonathan in ticket #504):
Fix oversight from re-using reworked sysctl() code for unicast SPD,SADB dump:
because the sysctl() code wasn't setting the requestor-pid field in dump
responses, the reworked unicast dump wasn't setting the requestor pid, either.
More exaclty, the pid field was set to 0.
No problem for setkey(8), but racoon reportedly ignores SADB dump-responses
with any pid (including 0) which doesn't match its own pid. A private bug
report says the 0-valued pid field broke racoon code which attempts to recover
from death of a prior racoon process, by dumping the SADB at startup.
Fix by revising sys/netipsec, so that both the new unicast PF_KEY dump
responses and the sysctl code set the requestor pid field in all
response mesages to DUMP requests.
 1.11.2.6 30-May-2004  tron Pull up revision 1.19 (requested by jonathan in ticket #405):
Rework to make FAST_IPSEC PF_KEY dumps unicast and reliable:
Introduce new socket-layer function sbappendaddrchain() to
sys/kern/uipc_socket2.c: like sbappendaddr(), only takes a chain of
records and appends the entire chain in one pass. sbappendaddrchain()
also takes an `sbprio' argument, which indicates the caller requires
special `reliable' handling of the socket-buffer. `sbprio' is
described in sys/sys/socketvar.h, although (for now) the different
levels are not yet implemented.
Rework sys/netipsec/key.c PF_KEY DUMP responses to build a chain of
mbuf records, one record per dump response. Unicast the entire chain
to the requestor, with all-or-none semantics.
Changed files;
sys/socketvar.h kern/uipc_socket2.c netipsec/key.c
Reviewed by:
Jason Thorpe, Thor Lancelot Simon, post to tech-kern.
Todo: request pullup to 2.0 branch. Post-2.0, rework sysctl() API for
dumps to use new record-chain constructors. Actually implement
the distinct service levels in sbappendaddrchain() so we can use them
to make PF_KEY ACQUIRE messages more reliable.
 1.11.2.5 29-May-2004  tron Pull up revision 1.18 (requested by jonathan in ticket #402):
Fix bugs in SPD refcounts due to PCBpolicy cache, by backporting the
KAME sys/netkey/key.c rev 1.119 ke_sp_unlink()/key_sp_dead() logic.
I have been running a similar version for about 10 days now, and it
fixes the PCB-cache refcount problems for me.
Checked in as a candidate for pullup to the 2.0 branch.
 1.11.2.4 29-May-2004  tron Pull up revision 1.17 (requested by jonathan in ticket #401):
Thanks to Andrew Brown for the heads-up that fast_ipsec still had
key_prefered_oldsa, defaulted to 1 (on): preferring old SAs, based on
the ill-concieved Jenkins I-D, is broken by design. For now, just
turn it off, as the simplest way to fix this in the 2.0 branch.
Next step is to rip it out entirely: it was always a bad idea.
 1.11.2.3 25-May-2004  jmc Pullup rev 1.16 (requested by atatat in ticket #386)

The FAST_IPSEC code actually supports KEYCTL_PREFERED_OLDSA, so export
it via sysctl.
 1.11.2.2 10-May-2004  tron Pull up revision 1.15 (requested by jonathan in ticket #281):
Fix for setkey(8) to dump SPD and SAdb via sysctl:
passing sysctl argument name[1] to a helper. According to Andrew
Brown's revised dynamic sysctl schmea, it must instead pass name[0].
2. There is a naming glitch in using sysctl() for setkey(8): setkey
queries the same sysctl MIB numbers to dump IPsec database state,
irrepesctive of the underlying IPsec is KAME or FAST_IPSEC.
For this to work as expected, sys/netipsec must export net.key.dumpsa
and net.key.dumpsp via the identical MIB numbers used by sys/netkey.
``Make it so''. For now, renumber the sys/netipsec/key.c nodes;
post-2.0 we can use sysctl aliases.
3. For as-yet-unexplained reasons, the PF_KEY_V2 nodes are never
shown (or queried?) by sysctl(8). For 2.0, I am following an earlier
suggestion from Andrew Brown, and renumbering allthe FAST_IPSEC sysctl
nodes to appear under net.key at MIB number { CTL_NET, PF_KEY }. Since
the renumbering may change, the renumbering is done via a level of
indirection in the C preprocessor.
The nett result is that setkey(8) can find the nodes it needs for
setkey -D and setkey -PD: and that sysctl(8) finds all the FAST_IPSEC
sysctl nodes relatedy to IPsec keying, under net.key. Andrew Brown
has reviewed this patch and tentatively approved the changes, though
we may rework some of the changes in -current in the near future.
 1.11.2.1 30-Apr-2004  jmc Pullup rev 1.14 (requested by jonathan in ticket #235)

Update sys/netipsec/key.c to check for attempts to add IPv6-related
SPDs, and to warn about and reject any such attempts.
 1.20.8.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.20.8.1 12-Feb-2005  yamt sync with head.
 1.20.6.1 29-Apr-2005  kent sync with -current
 1.20.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.20.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.20.2.5 15-Feb-2005  skrll Sync with HEAD.
 1.20.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.20.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.20.2.2 03-Aug-2004  skrll Sync with HEAD
 1.20.2.1 10-Jun-2004  skrll file key.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.25.2.5 21-Jan-2008  yamt sync with head
 1.25.2.4 03-Sep-2007  yamt sync with head.
 1.25.2.3 26-Feb-2007  yamt sync with head.
 1.25.2.2 30-Dec-2006  yamt sync with head.
 1.25.2.1 21-Jun-2006  yamt sync with head.
 1.27.8.1 11-Aug-2006  yamt sync with head
 1.27.4.1 09-Sep-2006  rpaulo sync with head
 1.28.6.3 18-Dec-2006  yamt sync with head.
 1.28.6.2 10-Dec-2006  yamt sync with head.
 1.28.6.1 22-Oct-2006  yamt sync with head
 1.28.4.2 12-Jan-2007  ad Sync with head.
 1.28.4.1 18-Nov-2006  ad Sync with head.
 1.30.2.3 11-May-2008  jdc Pull up revision 1.54 (requested by degroote in ticket #1137).

In key_do_allocsa_policy, fix a bad usage of key_setsadbmsg. The third argument
is an SADB_SATYPE_*, not an IPPROTO_* .

Fix PR/38405. Thanks for the report
 1.30.2.2 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.30.2.1 12-May-2007  pavel branches: 1.30.2.1.2;
Pull up following revision(s) (requested by degroote in ticket #630):
sys/netipsec/key.c: revision 1.43-1.46
sys/netinet6/ipsec.c: revision 1.116
sys/netipsec/ipsec.c: revision 1.29 via patch
sys/netkey/key.c: revision 1.154-1.155
Call key_checkspidup with spi in network bit order in order to make
comparaison with spi stored into the sadb.
Reported by Karl Knutsson in kern/36038 .

Make an exact match when we are looking for a cached sp for an unconnected
socket. If we don't make an exact match, we may use a cached rule which
has lower priority than a rule that would otherwise have matched the
packet.
Code submitted by Karl Knutsson in PR/36051

Fix a memleak in key_spdget.
Problem was reported by Karl Knutsson by pr/36119.

In spddelete2, if we can't find the sp by this id, return after sending an
error message, don't process the following code with the NULL sp.
Spotted by Matthew Grooms on freebsd-net ML

When we construct an answer for SADB_X_SPDGET, don't use an hardcoded 0 for seq but
the seq used by the request. It will improve consistency with the answer of SADB_GET
request and helps some applications which relies both on seq and pid.
Reported by Karl Knutsson by pr/36119.
 1.30.2.1.2.2 03-Jun-2008  skrll Sync with netbsd-4.
 1.30.2.1.2.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.35.2.5 17-May-2007  yamt sync with head.
 1.35.2.4 15-Apr-2007  yamt sync with head.
 1.35.2.3 24-Mar-2007  yamt sync with head.
 1.35.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.35.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.42.6.2 09-Dec-2007  reinoud Pullup to HEAD
 1.42.6.1 29-Mar-2007  reinoud Pullup to -current
 1.42.4.1 11-Jul-2007  mjf Sync with head.
 1.42.2.4 15-Jul-2007  ad Sync with head.
 1.42.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.42.2.2 08-Jun-2007  ad Sync with head.
 1.42.2.1 10-Apr-2007  ad Sync with head.
 1.50.16.1 08-Dec-2007  ad Sync with head.
 1.50.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.50.8.1 09-Jan-2008  matt sync with HEAD
 1.50.6.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.51.14.1 18-May-2008  yamt sync with head.
 1.51.12.3 28-Sep-2008  mjf Sync with HEAD.
 1.51.12.2 02-Jul-2008  mjf Sync with HEAD.
 1.51.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.53.2.4 09-Oct-2010  yamt sync with head
 1.53.2.3 11-Mar-2010  yamt sync with head
 1.53.2.2 04-May-2009  yamt sync with head.
 1.53.2.1 16-May-2008  yamt sync with head.
 1.55.4.2 28-Jul-2008  simonb Sync with head.
 1.55.4.1 03-Jul-2008  simonb Sync with head.
 1.55.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.56.2.1 19-Oct-2008  haad Sync with HEAD.
 1.57.10.1 21-Apr-2010  matt sync to netbsd-5
 1.57.4.1 14-Feb-2010  bouyer Pull up following revision(s) (requested by hubertf in ticket #1290):
sys/kern/kern_ksyms.c: revision 1.53
sys/dev/pci/agp_via.c: revision 1.18
sys/netipsec/key.c: revision 1.63
sys/arch/x86/x86/x86_autoconf.c: revision 1.49
sys/kern/init_main.c: revision 1.415
sys/kern/cnmagic.c: revision 1.11
sys/netipsec/ipsec.c: revision 1.47
sys/arch/x86/x86/pmap.c: revision 1.100
sys/netkey/key.c: revision 1.176
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.57.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.57.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.58.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.63.4.3 12-Jun-2011  rmind sync with head
 1.63.4.2 31-May-2011  rmind sync with head
 1.63.4.1 05-Mar-2011  rmind sync with head
 1.63.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.64.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.64.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.71.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.74.6.1 18-Feb-2012  mrg merge to -current.
 1.74.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.74.2.2 30-Oct-2012  yamt sync with head
 1.74.2.1 17-Apr-2012  yamt sync with head
 1.76.4.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.76.2.1 03-Sep-2012  riz Apply patch requested by msaitoh in pullup-6 ticket #538:

* add TAILQ satailq and sptailq
- these queues are referenced from kernfs/ipsecsa, kernfs/ipsecsp
as a weak_symbol.
- KAME netkey has the two queues, but FAST-IPsec netkey doen't.
This cause a panic. To prevent this panic, make a empty tailq.
- The tailq doen't work, because there are no implementation yet...
 1.78.2.4 03-Dec-2017  jdolecek update from HEAD
 1.78.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.78.2.2 23-Jun-2013  tls resync from head
 1.78.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.82.2.2 18-May-2014  rmind sync with head
 1.82.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.86.2.1 10-Aug-2014  tls Rebase.
 1.91.4.4 28-Aug-2017  skrll Sync with HEAD
 1.91.4.3 05-Oct-2016  skrll Sync with HEAD
 1.91.4.2 09-Jul-2016  skrll Sync with HEAD
 1.91.4.1 19-Mar-2016  skrll Sync with HEAD
 1.91.2.1 13-Mar-2016  martin Pull up following revision(s) (requested by christos in ticket #1136):
sys/netipsec/key.c: revision 1.92-1.97
sys/netipsec/key_debug.h: revision 1.7

Add more debugging, no functional change.

Gather more information from mbuf.

Fix port matching; we need to ignore ports when they are 0 not only in
the second saidx but the first one too. Fixes NAT-T issue with NetBSD
being the host behind NAT.

Kill stray &

Simplify the port comparison code further.
PR/50905: Henning Petersen: Fix useless comparison (from FreeBSD)
 1.100.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.100.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.100.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.101.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.118.2.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.118.2.1 11-May-2017  pgoyette Sync with HEAD
 1.163.2.15 13-Mar-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1520):

sys/netipsec/key.c: revision 1.271
sys/net/if_ipsec.c: revision 1.28
sys/net/if_ipsec.c: revision 1.29

Fix ipsecif(4) SPDADD pfkey message has garbage. Pointed out by ohishi@IIJ.

"setkey -x" output is the following.
 1.163.2.14 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.163.2.13 10-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #1372):

sys/netipsec/key.c: revision 1.266

Fix info leaks.
 1.163.2.12 25-Jul-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1306):

crypto/dist/ipsec-tools/src/setkey/parse.y: revision 1.23
sys/netipsec/key.c: revision 1.265
crypto/dist/ipsec-tools/src/setkey/token.l: revision 1.23
tests/net/ipsec/t_ipsec_misc.sh: revision 1.23

ipsec: fix a regression of the update API

The update API updates an SA by creating a new SA and removing an existing SA.
The previous change removed a newly added SA wrongly if an existing SA had been
created by the getspi API.

setkey: enable to use the getspi API

If a specified SPI is not zero, tell the kernel to use the SPI by using
SADB_EXT_SPIRANGE. Otherwise, the kernel picks a random SPI.

It enables to mimic racoon.

tests: add tests for getspi and udpate
 1.163.2.11 22-Jul-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1303):

sys/netipsec/key.c: revision 1.264

Avoid a race condition between SA (sav) manipulations

An sav can be removed from belonging list(s) twice resulting in an assertion
failure of pslist. It can occur if the following two operations interleave:

(i) a deletion or a update of an SA via the API, and
(ii) a state change (key_sa_chgstate) of the same SA by the timer.

Note that even (ii) removes an sav once from its list(s) on a update.
The cause of the race condition is that the two operations are not serialized
and (i) doesn't get and remove an sav from belonging list(s) atomically. So
(ii) can be inserted between an acquisition and a removal of (i).

Avoid the race condition by making (i) atomic.
 1.163.2.10 25-Aug-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #986):

sys/netipsec/key.c: revision 1.257

Don't call key_ismyaddr, which may sleep, in a pserialize read section

Use mutex here instead of pserialize because using mutex is simpler than
using psz+ref, which is another solution, and key_checkspidup isn't called in
any performance-sensitive paths.
 1.163.2.9 18-Apr-2018  martin Pull up following revision(s) (requested by yamaguchi in ticket #776):

sys/netipsec/key.c: revision 1.251-1.253
sys/netipsec/keydb.h: revision 1.22

Introduced a hash table to sahlist

An saidx of sah included in the list is unique so that
the search can use a hash list whose hash is calculated by
the saidx to find an sah quickly.

The hash list of the sahlits is used in FreeBSD, too.
reviewed by ozaki-r@n.o, thanks.

Added a lookup table to find an sav quickly
key_sad.sahlists doesn't work well for inbound packets because
its key includes source address. For the reason, the
look-up-table for the inbound packets is newly added.
The table has all sav whose state is MATURE or DYING and uses a
key calculated by destination address, protocol, and spi instead
of saidx.

reviewd ozaki-r@n.o, thanks.

Fix panic of SADB when the state of sav is changed in timeout
pointed out by ozaki-r@n.o, thanks
 1.163.2.8 16-Apr-2018  martin Pull up following revision(s) (requested by yamaguchi in ticket #766):

sys/netipsec/key.c: revision 1.250

Removed the unnecessary order check of key_lookup_sa

key_prefered_oldsa flag can change the sa to use if an sah
has multiple sav. However the multiple saves whose protocol
is ah, esp, or tcp cannot exist because their duplications
are checked by the spi value. Although the multiple saves
can exist in the case of ipcomp, the values using in the
post processing are same between the saves.

For those reasons, it is no need to select an sav by its
lifetime.

In addition, FreeBSD has already remove this.
reviewed by ozaki-r@n.o, thanks.
 1.163.2.7 07-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #609):
sys/netipsec/key.c: revision 1.249
sys/netipsec/keydb.h: revision 1.21
Avoid data races on lifetime counters by using percpu(9)
We don't make them percpu(9) directly because the structure is exposed to
userland and we don't want to break ABI. So we add another member variable
for percpu(9) and use it internally. When we export them to userland, they
are converted to the original format.
 1.163.2.6 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.163.2.5 01-Dec-2017  martin Pull up following revision(s) (requested by christos in ticket #415):
sys/netipsec/key.c: revision 1.244
sys/netipsec/key.c: revision 1.245
Use KDASSERT for mutex_ownable
Because mutex_ownable is not cheap.
Fix a deadlock happening if !NET_MPSAFE
If NET_MPSAFE isn't set, key_timehandler_work is executed with holding
softnet_lock. This means that localcount_drain can be called with holding
softnet_lock resulting in a deadlock that localcount_drain waits for packet
processing to release a reference to SP/SA while network processing is prevented
by softnet_lock.
Fix the deadlock by not taking softnet_lock in key_timehandler_work. It's okay
because IPsec is MP-safe even if !NET_MPSAFE. Note that the change also needs
to enable pserialize_perform because the IPsec code can be run in parallel now.
Reported by christos@
 1.163.2.4 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #407):
sys/compat/linux32/common/linux32_socket.c: revision 1.28
sys/net/if.c: revision 1.400
sys/netipsec/key.c: revision 1.243
sys/compat/linux/common/linux_socket.c: revision 1.139
sys/netinet/ip_carp.c: revision 1.93
sys/netinet6/in6.c: revision 1.252
sys/netinet6/in6.c: revision 1.253
sys/netinet6/in6.c: revision 1.254
sys/net/if_spppsubr.c: revision 1.173
sys/net/if_spppsubr.c: revision 1.174
sys/compat/common/uipc_syscalls_40.c: revision 1.14
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Fix usage of FOREACH macro
key_sad.lock is held there so SAVLIST_WRITER_FOREACH is enough.
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref (more)
Fix and make consistent of usages of psz/psref in ifconf variants
Remove unnecessary goto because there is no cleanup code to share (NFC)
Tweak a condition; we don't need to care ifacount to be negative
Fix a race condition of in6_ifinit
in6_ifinit checks the number of IPv6 addresses on a given interface and
if it's zero (i.e., an IPv6 address being assigned to the interface
is the first one), call if_addr_init. However, the actual assignment of
the address (ifa_insert) is out of in6_ifinit. The check and the
assignment must be done atomically.
Fix it by holding in6_ifaddr_lock during in6_ifinit and ifa_insert.
And also add missing pserialize to IFADDR_READER_FOREACH.
 1.163.2.3 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #406):
sys/netipsec/key.c: revision 1.239
sys/netipsec/key.c: revision 1.240
sys/netipsec/key.c: revision 1.241
sys/netipsec/key.c: revision 1.242
sys/netipsec/key.h: revision 1.33
sys/netipsec/ipsec.c: revision 1.123
sys/netipsec/key.c: revision 1.236
sys/netipsec/key.c: revision 1.237
sys/netipsec/key.c: revision 1.238
Provide a function to call MGETHDR and MCLGET
The change fixes two usages of MGETHDR that don't check whether a mbuf is really
allocated before passing it to MCLGET.
Fix error handling of MCLGET in key_alloc_mbuf
Add missing splx to key_spdexpire
Use M_WAITOK to allocate mbufs wherever sleepable
Further changes will get rid of unnecessary NULL checks then.
Get rid of unnecessary NULL checks that are obsoleted by M_WAITOK
Simply the code by avoiding unnecessary error checks
- Remove unnecessary m_pullup for self-allocated mbufs
- Replace some if-fails-return sanity checks with KASSERT
Call key_sendup_mbuf immediately unless key_acquire is called in softint
We need to defer it only if it's called in softint to avoid deadlock.
 1.163.2.2 21-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #360):
tests/net/ipsec/t_ipsec_misc.sh: revision 1.21
tests/net/ipsec/t_ipsec_misc.sh: revision 1.22
sys/netipsec/key.c: revision 1.235
Mark key_timehandler_ch callout as MP-safe (just forgot to do so)
"Mark key_timehandler_ch callout as MP-safe" change needs one more sec to make lifetime tests stable
Dedup some checks
And the change a bit optimizes checks of SA expirations, which
may shorten testing time.
 1.163.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.188.2.2 18-Jul-2017  ozaki-r 3199078
 1.188.2.1 18-Jul-2017  ozaki-r file key.c was added on branch perseant-stdc-iso10646 on 2017-07-18 02:10:34 +0000
 1.249.2.7 18-Jan-2019  pgoyette Synch with HEAD
 1.249.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.249.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.249.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.249.2.3 02-May-2018  pgoyette Synch with HEAD
 1.249.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.249.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.255.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.255.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.255.2.1 10-Jun-2019  christos Sync with HEAD
 1.265.2.3 13-Mar-2020  martin Pull up following revision(s) (requested by knakahara in ticket #780):

sys/netipsec/key.c: revision 1.271
sys/net/if_ipsec.c: revision 1.28
sys/net/if_ipsec.c: revision 1.29

Fix ipsecif(4) SPDADD pfkey message has garbage. Pointed out by ohishi@IIJ.

"setkey -x" output is the following.
 1.265.2.2 14-Nov-2019  martin Pull up following revision(s) (requested by knakahara in ticket #423):

sys/netipsec/key.c: revision 1.268
sys/netipsec/key.c: revision 1.269

Fix SA can be expaired wrongly when there are many SPs.

When key_timehandler_spd() spent over one second, the "now" argument of
key_timehandler_sad() could be older than sav->created. That caused SA
was expired immediately.

-

Reduce load for IKE negotiations when the system has many IPv6 addresses.
e.g. the system has many vlan(4), gif(4) or ipsecif(4) with link local address.
 1.265.2.1 01-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #127):

sys/netipsec/key.c: revision 1.266

Fix info leaks.
 1.269.2.1 29-Feb-2020  ad Sync with head.
 1.280.2.1 02-Oct-2023  martin Pull up following revision(s) (requested by knakahara in ticket #378):

tests/net/if_ipsec/t_ipsec_unnumbered.sh: revision 1.2
sys/net/if_ipsec.c: revision 1.35
sys/netipsec/key.c: revision 1.281

Use kmem_free instead of kmem_intr_free, as key_freesaval() is not called in softint after key.c:r1.223.
E.g. key_freesaval() was called the following call path before SAD MP-ify.
esp_input_cb()
KEY_FREESAV()
key_freesav()
key_delsav()
key_freesaval()
ok'ed by ozaki-r@n.o.

Use unit id instead of if_index to reduce fixed_reqid space.

Update for sys/net/if_ipsec.c:r1.35
 1.283.2.1 02-Aug-2025  perseant Sync with HEAD
 1.38 08-Dec-2022  knakahara Fix: update lastused of ipsecif(4) IPv6 out SP.
 1.37 09-Aug-2021  andvar fix various typos in compatibility, mainly in comments.
 1.36 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.35 18-Apr-2018  maxv Style, and remove unused MALLOC_DECLARE.
 1.34 10-Jan-2018  knakahara branches: 1.34.2;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.33 21-Nov-2017  ozaki-r Use M_WAITOK to allocate mbufs wherever sleepable

Further changes will get rid of unnecessary NULL checks then.
 1.32 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.31 03-Oct-2017  ozaki-r Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
 1.30 03-Oct-2017  ozaki-r Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
 1.29 09-Aug-2017  ozaki-r MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
 1.28 08-Aug-2017  ozaki-r Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
 1.27 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.26 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.25 26-Jul-2017  ozaki-r Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
 1.24 21-Jul-2017  ozaki-r Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
 1.23 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.22 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.21 13-Jul-2017  ozaki-r Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
 1.20 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.19 30-May-2017  ozaki-r branches: 1.19.2;
Make refcnt operations of SA and SP atomic

Using atomic opeartions isn't optimal and should be optimized somehow
in the future though, the change allows a kernel with NET_MPSAFE to
run out a benchmark, which is useful to know performance improvement
and degradation by code changes.
 1.18 26-May-2017  ozaki-r Make key_cmpspidx_exactly and key_cmpspidx_withmask static
 1.17 26-May-2017  ozaki-r Comment out unused key_freesp_so and key_freeso
 1.16 16-May-2017  ozaki-r Run key_timehandler in thread context (workqueue)

The handler involves object deallocations so we want to not run
it in softint.
 1.15 15-May-2017  ozaki-r Show __func__ instead of __FILE__ in debug log messages

__func__ is shorter and more useful than __FILE__.
 1.14 30-Mar-2015  ozaki-r branches: 1.14.8;
Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.13 30-May-2014  christos branches: 1.13.4;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.12 04-Jun-2013  christos branches: 1.12.6;
PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.11 09-Jun-2011  drochner branches: 1.11.2; 1.11.8; 1.11.10; 1.11.12;
more "const"
 1.10 23-May-2011  drochner branches: 1.10.2;
g/c remainders of IV handling in pfkey code -- this is done in
opencrypto now
 1.9 21-Feb-2011  drochner treat "struct secpolicyindex" and "struct secasindex" as "const" once
they are initialized -- during lifetime, no changes are expected
plus some constification of input to comparision functions etc
mostly required by the former
 1.8 07-Jul-2007  degroote branches: 1.8.56; 1.8.62; 1.8.64;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.7 27-Jun-2007  degroote Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.6 04-Mar-2007  christos branches: 1.6.2; 1.6.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.5 18-Feb-2007  degroote Constify the code following the dyoung change ( the "bug" was hidden by the
extern declaration ).
While here, remove a Kame ifdef which is useless in netipsec code
 1.4 10-Dec-2005  elad branches: 1.4.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.3 26-Feb-2005  perry branches: 1.3.4;
nuke trailing whitespace
 1.2 02-Mar-2004  thorpej branches: 1.2.4; 1.2.10; 1.2.12;
Bring the PCB policy cache over from KAME IPsec, including the "hint"
used to short-circuit IPsec processing in other places.

This is enabled only for NetBSD at the moment; in order for it to function
correctly, ipsec_pcbconn() must be called as appropriate.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.2.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.2.10.1 29-Apr-2005  kent sync with -current
 1.2.4.6 11-Dec-2005  christos Sync with head.
 1.2.4.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.2.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.4.2 03-Aug-2004  skrll Sync with HEAD
 1.2.4.1 02-Mar-2004  skrll file key.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.3.4.3 03-Sep-2007  yamt sync with head.
 1.3.4.2 26-Feb-2007  yamt sync with head.
 1.3.4.1 21-Jun-2006  yamt sync with head.
 1.4.26.2 12-Mar-2007  rmind Sync with HEAD.
 1.4.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.6.4.1 11-Jul-2007  mjf Sync with head.
 1.6.2.1 15-Jul-2007  ad Sync with head.
 1.8.64.1 05-Mar-2011  bouyer Sync with HEAD
 1.8.62.1 06-Jun-2011  jruoho Sync with HEAD.
 1.8.56.3 12-Jun-2011  rmind sync with head
 1.8.56.2 31-May-2011  rmind sync with head
 1.8.56.1 05-Mar-2011  rmind sync with head
 1.10.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.11.12.3 03-Dec-2017  jdolecek update from HEAD
 1.11.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.12.1 23-Jun-2013  tls resync from head
 1.11.10.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.11.8.1 03-Sep-2012  riz Apply patch requested by msaitoh in pullup-6 ticket #538:

* add TAILQ satailq and sptailq
- these queues are referenced from kernfs/ipsecsa, kernfs/ipsecsp
as a weak_symbol.
- KAME netkey has the two queues, but FAST-IPsec netkey doen't.
This cause a panic. To prevent this panic, make a empty tailq.
- The tailq doen't work, because there are no implementation yet...
 1.11.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.6.1 10-Aug-2014  tls Rebase.
 1.13.4.2 28-Aug-2017  skrll Sync with HEAD
 1.13.4.1 06-Apr-2015  skrll Sync with HEAD
 1.14.8.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.19.2.3 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.19.2.2 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #406):
sys/netipsec/key.c: revision 1.239
sys/netipsec/key.c: revision 1.240
sys/netipsec/key.c: revision 1.241
sys/netipsec/key.c: revision 1.242
sys/netipsec/key.h: revision 1.33
sys/netipsec/ipsec.c: revision 1.123
sys/netipsec/key.c: revision 1.236
sys/netipsec/key.c: revision 1.237
sys/netipsec/key.c: revision 1.238
Provide a function to call MGETHDR and MCLGET
The change fixes two usages of MGETHDR that don't check whether a mbuf is really
allocated before passing it to MCLGET.
Fix error handling of MCLGET in key_alloc_mbuf
Add missing splx to key_spdexpire
Use M_WAITOK to allocate mbufs wherever sleepable
Further changes will get rid of unnecessary NULL checks then.
Get rid of unnecessary NULL checks that are obsoleted by M_WAITOK
Simply the code by avoiding unnecessary error checks
- Remove unnecessary m_pullup for self-allocated mbufs
- Replace some if-fails-return sanity checks with KASSERT
Call key_sendup_mbuf immediately unless key_acquire is called in softint
We need to defer it only if it's called in softint to avoid deadlock.
 1.19.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.34.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.25 11-Oct-2022  knakahara Add sadb_x_policy_flags to inform SP origination.

This extension(struct sadb_x_policy) is *not* defined by RFC2367.

OpenBSD does not have reserved fields in struct sadb_x_policy.
Linux does not use this field yet.
FreeBSD uses this field as "sadb_x_policy_scope"; the value range is
from 0x00 to 0x04.

We use from most significant bit to avoid the above usage.
 1.24 18-May-2022  christos PR/56841: Andrew Cagney: debug-log IPcomp CPI lookups:
- debug-logs why an SPI is rejected
- adds missing __VA_OPT__(,) to some printf macros
- debug-log SPI+proto when adding/updating entry
 1.23 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.22 19-Apr-2018  maxv branches: 1.22.2;
Remove extra long file paths from the headers.
 1.21 28-Sep-2017  christos branches: 1.21.2;
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
 1.20 08-Aug-2017  ozaki-r Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
 1.19 26-Jul-2017  ozaki-r Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
 1.18 21-Jul-2017  ozaki-r Remove ipsecrequest#sav
 1.17 26-Apr-2017  ozaki-r branches: 1.17.4;
Correct the length of the SADB_EXT header in debug outputs

The length is shifted 3 bits in PF_KEY protocol.

Originally fixed by hsuenaga@IIJ
 1.16 18-Apr-2017  ozaki-r Use %zu for size_t (fix build of 32bit archs)
 1.15 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.14 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.13 10-Jun-2016  ozaki-r branches: 1.13.2; 1.13.4;
Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.12 30-Mar-2015  ozaki-r Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.11 23-May-2011  drochner branches: 1.11.14; 1.11.32;
g/c remainders of IV handling in pfkey code -- this is done in
opencrypto now
 1.10 21-Feb-2011  drochner declare input to kdebug_*() functions which dump structures
to stdout in human readable form as "const"
 1.9 07-Jul-2007  degroote branches: 1.9.56; 1.9.62; 1.9.64;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.8 04-Mar-2007  degroote branches: 1.8.2; 1.8.4;
Remove useless cast
Use NULL instead of (void*) 0
 1.7 04-Mar-2007  degroote Fix fallout from caddr_t changes
 1.6 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.5 11-Dec-2005  christos branches: 1.5.26;
merge ktrace-lwp.
 1.4 08-May-2005  christos branches: 1.4.2;
Panic strings should not end with \n.
 1.3 06-Oct-2003  tls branches: 1.3.4;
Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.2 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.3.4.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.3.4.2 03-Aug-2004  skrll Sync with HEAD
 1.3.4.1 06-Oct-2003  skrll file key_debug.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.4.2.1 03-Sep-2007  yamt sync with head.
 1.5.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.8.4.1 11-Jul-2007  mjf Sync with head.
 1.8.2.1 15-Jul-2007  ad Sync with head.
 1.9.64.1 05-Mar-2011  bouyer Sync with HEAD
 1.9.62.1 06-Jun-2011  jruoho Sync with HEAD.
 1.9.56.2 31-May-2011  rmind sync with head
 1.9.56.1 05-Mar-2011  rmind sync with head
 1.11.32.3 28-Aug-2017  skrll Sync with HEAD
 1.11.32.2 09-Jul-2016  skrll Sync with HEAD
 1.11.32.1 06-Apr-2015  skrll Sync with HEAD
 1.11.14.1 03-Dec-2017  jdolecek update from HEAD
 1.13.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.13.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.17.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.21.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.22.2.1 10-Jun-2019  christos Sync with HEAD
 1.11 18-May-2022  christos PR/56841: Andrew Cagney: debug-log IPcomp CPI lookups:
- debug-logs why an SPI is rejected
- adds missing __VA_OPT__(,) to some printf macros
- debug-log SPI+proto when adding/updating entry
 1.10 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.9 28-Sep-2017  christos branches: 1.9.2;
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
 1.8 19-Apr-2017  ozaki-r branches: 1.8.4;
Tweak KEYDEBUG macros

Let's avoid passing statements to a macro.
 1.7 05-Mar-2016  christos branches: 1.7.2; 1.7.4;
Add more debugging, no functional change.
 1.6 21-Feb-2011  drochner branches: 1.6.14; 1.6.30; 1.6.32;
declare input to kdebug_*() functions which dump structures
to stdout in human readable form as "const"
 1.5 07-Jul-2007  degroote branches: 1.5.56; 1.5.62; 1.5.64;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.4 04-Mar-2007  degroote branches: 1.4.2; 1.4.4;
Fix fallout from caddr_t changes
 1.3 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.2 10-Dec-2005  elad branches: 1.2.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 13-Aug-2003  jonathan branches: 1.1.4; 1.1.18;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.18.2 03-Sep-2007  yamt sync with head.
 1.1.18.1 21-Jun-2006  yamt sync with head.
 1.1.4.5 11-Dec-2005  christos Sync with head.
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 13-Aug-2003  skrll file key_debug.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.2.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.4.4.1 11-Jul-2007  mjf Sync with head.
 1.4.2.1 15-Jul-2007  ad Sync with head.
 1.5.64.1 05-Mar-2011  bouyer Sync with HEAD
 1.5.62.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.56.1 05-Mar-2011  rmind sync with head
 1.6.32.2 28-Aug-2017  skrll Sync with HEAD
 1.6.32.1 19-Mar-2016  skrll Sync with HEAD
 1.6.30.1 13-Mar-2016  martin Pull up following revision(s) (requested by christos in ticket #1136):
sys/netipsec/key.c: revision 1.92-1.97
sys/netipsec/key_debug.h: revision 1.7

Add more debugging, no functional change.

Gather more information from mbuf.

Fix port matching; we need to ignore ports when they are 0 not only in
the second saidx but the first one too. Fixes NAT-T issue with NetBSD
being the host behind NAT.

Kill stray &

Simplify the port comparison code further.
PR/50905: Henning Petersen: Fix useless comparison (from FreeBSD)
 1.6.14.1 03-Dec-2017  jdolecek update from HEAD
 1.7.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.7.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.8.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.9.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.6 09-Aug-2022  knakahara Add sysctl entry to improve interconnectivity to some VPN appliances, pointed out by seil-team@IIJ.

If we want to allow different identifier types on IDii and IDir, set
net.key.allow_different_idtype=1. Default(=0) is the same as before.
 1.5 28-Apr-2018  maxv Remove unused macros.
 1.4 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.3 10-Dec-2005  elad branches: 1.3.162;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 12-Dec-2003  scw branches: 1.2.4; 1.2.18;
Add KEYCTL_DUMPSA/KEYCTL_DUMPSP support.
setkey(8)'s -D and -P options now work as expected with fast-ipsec.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.2.18.1 21-Jun-2006  yamt sync with head.
 1.2.4.5 11-Dec-2005  christos Sync with head.
 1.2.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.4.2 03-Aug-2004  skrll Sync with HEAD
 1.2.4.1 12-Dec-2003  skrll file key_var.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.3.162.2 02-May-2018  pgoyette Synch with HEAD
 1.3.162.1 22-Apr-2018  pgoyette Sync with HEAD
 1.24 10-Nov-2021  msaitoh s/assocciation/association/ in comment.
 1.23 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.22 16-Apr-2018  yamaguchi Added a lookup table to find an sav quickly

key_sad.sahlists doesn't work well for inbound packets because
its key includes source address. For the reason, the
look-up-table for the inbound packets is newly added.
The table has all sav whose state is MATURE or DYING and uses a
key calculated by destination address, protocol, and spi instead
of saidx.

reviewd ozaki-r@n.o, thanks.
 1.21 02-Mar-2018  ozaki-r branches: 1.21.2;
Avoid data races on lifetime counters by using percpu(9)

We don't make them percpu(9) directly because the structure is exposed to
userland and we don't want to break ABI. So we add another member variable
for percpu(9) and use it internally. When we export them to userland, they
are converted to the original format.
 1.20 09-Aug-2017  ozaki-r MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
 1.19 08-Aug-2017  ozaki-r MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
 1.18 07-Aug-2017  ozaki-r Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
 1.17 03-Aug-2017  ozaki-r Use pslist(9) for sah->savtree
 1.16 03-Aug-2017  ozaki-r Use pslist(9) for sahtree
 1.15 17-May-2017  ozaki-r branches: 1.15.2;
Replace malloc/free with kmem(9) and kill KMALLOC/KFREE macros
 1.14 30-Mar-2015  ozaki-r branches: 1.14.8;
Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.13 04-Jun-2013  christos branches: 1.13.10;
PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.12 29-Aug-2012  drochner branches: 1.12.2;
g/c unused struct member
 1.11 11-Jan-2012  drochner protect "union sockaddr_union" from being defined twice by a CPP symbol
(copied from FreeBSD), allows coexistence of (FAST_)IPSEC and pf
 1.10 23-May-2011  drochner branches: 1.10.4; 1.10.8;
g/c remainders of IV handling in pfkey code -- this is done in
opencrypto now
 1.9 16-May-2011  drochner use time_t rather than long for timestamps
 1.8 18-Feb-2011  drochner more "const"
 1.7 28-Aug-2010  spz branches: 1.7.2; 1.7.4;
trivial comment typo
 1.6 07-Jul-2007  degroote branches: 1.6.32; 1.6.54; 1.6.56;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.5 27-Jun-2007  degroote Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.4 04-Mar-2007  degroote branches: 1.4.2; 1.4.4;
Fix fallout from caddr_t changes
 1.3 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.2 10-Dec-2005  elad branches: 1.2.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 13-Aug-2003  jonathan branches: 1.1.4; 1.1.18;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.18.2 03-Sep-2007  yamt sync with head.
 1.1.18.1 21-Jun-2006  yamt sync with head.
 1.1.4.5 11-Dec-2005  christos Sync with head.
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 13-Aug-2003  skrll file keydb.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.2.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.4.4.1 11-Jul-2007  mjf Sync with head.
 1.4.2.1 15-Jul-2007  ad Sync with head.
 1.6.56.2 31-May-2011  rmind sync with head
 1.6.56.1 05-Mar-2011  rmind sync with head
 1.6.54.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.6.32.1 09-Oct-2010  yamt sync with head
 1.7.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.7.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.10.8.1 18-Feb-2012  mrg merge to -current.
 1.10.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.10.4.2 30-Oct-2012  yamt sync with head
 1.10.4.1 17-Apr-2012  yamt sync with head
 1.12.2.2 03-Dec-2017  jdolecek update from HEAD
 1.12.2.1 23-Jun-2013  tls resync from head
 1.13.10.2 28-Aug-2017  skrll Sync with HEAD
 1.13.10.1 06-Apr-2015  skrll Sync with HEAD
 1.14.8.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.15.2.3 18-Apr-2018  martin Pull up following revision(s) (requested by yamaguchi in ticket #776):

sys/netipsec/key.c: revision 1.251-1.253
sys/netipsec/keydb.h: revision 1.22

Introduced a hash table to sahlist

An saidx of sah included in the list is unique so that
the search can use a hash list whose hash is calculated by
the saidx to find an sah quickly.

The hash list of the sahlits is used in FreeBSD, too.
reviewed by ozaki-r@n.o, thanks.

Added a lookup table to find an sav quickly
key_sad.sahlists doesn't work well for inbound packets because
its key includes source address. For the reason, the
look-up-table for the inbound packets is newly added.
The table has all sav whose state is MATURE or DYING and uses a
key calculated by destination address, protocol, and spi instead
of saidx.

reviewd ozaki-r@n.o, thanks.

Fix panic of SADB when the state of sav is changed in timeout
pointed out by ozaki-r@n.o, thanks
 1.15.2.2 07-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #609):
sys/netipsec/key.c: revision 1.249
sys/netipsec/keydb.h: revision 1.21
Avoid data races on lifetime counters by using percpu(9)
We don't make them percpu(9) directly because the structure is exposed to
userland and we don't want to break ABI. So we add another member variable
for percpu(9) and use it internally. When we export them to userland, they
are converted to the original format.
 1.15.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.21.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.72 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.71 29-Jun-2024  riastradh branches: 1.71.2;
netipsec: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.70 12-Jun-2019  christos make DPRINTF use varyadic cpp macros, and merge with IPSECLOG.
 1.69 26-Feb-2019  maxv Fix locking: it is fine if the lock is already key_so_mtx, this can happen
in socketpair. In that case don't take it.

Ok ozaki-r@

Reported-by: syzbot+901e2e5edaaaed21c069@syzkaller.appspotmail.com
 1.68 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.67 24-Dec-2018  maxv Remove unused function.
 1.66 08-Nov-2018  roy Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.
 1.65 26-Apr-2018  maxv branches: 1.65.2;
Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.64 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.63 19-Mar-2018  roy socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.
 1.62 28-Sep-2017  christos branches: 1.62.2;
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
 1.61 25-Sep-2017  ozaki-r Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
 1.60 08-Aug-2017  ozaki-r Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
 1.59 27-Jul-2017  ozaki-r Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
 1.58 25-May-2017  ozaki-r branches: 1.58.2;
Support SO_OVERFLOWED on PF_KEY sockets

The original author is hsuenaga@IIJ
 1.57 25-May-2017  ozaki-r Warn if failed to send up a PF_KEY message
 1.56 25-May-2017  ozaki-r KNF: remove extra leading whitespaces
 1.55 16-May-2017  ozaki-r Use kmem(9) instead of malloc/free

Some of non-sleepable allocations can be replaced with sleepable ones.
To make it clear that the replacements are possible, some assertions
are addded.
 1.54 27-Apr-2017  ozaki-r Fix KASSERT; restore a lost statement
 1.53 21-Apr-2017  ozaki-r branches: 1.53.2;
Use KASSERT
 1.52 19-Apr-2017  ozaki-r Tweak KEYDEBUG macros

Let's avoid passing statements to a macro.
 1.51 19-Apr-2017  ozaki-r Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.50 10-Jun-2016  ozaki-r branches: 1.50.2; 1.50.4;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.49 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.48 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.47 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.46 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.45 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.44 30-Mar-2015  ozaki-r Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.43 09-Aug-2014  rtr branches: 1.43.4;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.42 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.41 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.40 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.39 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.38 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.37 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.36 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.35 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.34 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.33 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.32 07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.31 07-Jul-2014  rtr * have pr_stat return EOPNOTSUPP consistently for all protocols that do
not fill in struct stat instead of returning success.

* in pr_stat remove all checks for non-NULL so->so_pcb except where the
pcb is actually used (i.e. cases where we don't return EOPNOTSUPP).

proposed on tech-net@
 1.30 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.29 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.28 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.27 05-Jun-2014  christos CID 1220167: NULL Deref
 1.26 21-May-2014  rmind G/C __FreeBSD__
 1.25 21-May-2014  rmind raw_detach: rawpcb may be embedded, free using the real size (saved in rcb).
 1.24 20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.23 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.22 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.21 17-Jul-2011  joerg branches: 1.21.12; 1.21.16; 1.21.26;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.20 16-May-2011  drochner remove a useless m_freem() call where the argument is known to be NULL
 1.19 08-Feb-2010  joerg branches: 1.19.2; 1.19.4;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
 1.18 18-Mar-2009  cegger branches: 1.18.2;
bzero -> memset
 1.17 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.16 24-Apr-2008  ad branches: 1.16.2; 1.16.10; 1.16.16;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.15 23-Apr-2008  thorpej PF_KEY stats for IPSEC and FAST_IPSEC are now per-CPU.
 1.14 07-Jul-2007  degroote branches: 1.14.28; 1.14.30;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.13 04-Mar-2007  degroote branches: 1.13.2; 1.13.4;
Remove useless cast
Use NULL instead of (void*) 0
 1.12 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.11 13-Oct-2006  christos branches: 1.11.4;
more __unused
 1.10 31-Aug-2006  matt branches: 1.10.2; 1.10.4;
Make this compile again (hi xtos!). Switch to C99 structure initializations.
 1.9 11-Dec-2005  christos branches: 1.9.4; 1.9.8;
merge ktrace-lwp.
 1.8 08-May-2005  christos branches: 1.8.2;
Panic strings should not end with \n.
 1.7 26-Feb-2005  perry nuke trailing whitespace
 1.6 23-Jan-2005  matt branches: 1.6.2;
Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.5 10-Jun-2004  jonathan branches: 1.5.2; 1.5.6;
Commit changes to make ACQUIRE messages -- actually, all messages
to ``registered'' sockets -- be treated ``specially'', as suggested
by RFC-2367.

The "special" treatment sys/netipsec now gives such messages is that
we use sbappendaddrchain() to deliver the (single) kernel-generated
message to each registered PF_KEY socket, with an sbprio argument of
SB_PRIO_BESTEFFORT, thus by-passing

For now, we check for registered messages, set a local `sbprio'
argument, and call sbappendaddrchain() (as opposed to sbappendaddr())
if and only if sbprio is non-NULL. As noted, we can rework
key_sendup_mbuf(), and all its callers, to pass the sbprio argument;
pending consensus (and hopeful KAME buy-back).
 1.4 26-Apr-2004  matt Remove #else of #if __STDC__
 1.3 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.2 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.5.6.1 29-Apr-2005  kent sync with -current
 1.5.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.5.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.5.2.6 24-Jan-2005  skrll Adapt to branch.
 1.5.2.5 24-Jan-2005  skrll Sync with HEAD.
 1.5.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.2 03-Aug-2004  skrll Sync with HEAD
 1.5.2.1 10-Jun-2004  skrll file keysock.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.6.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.8.2.3 03-Sep-2007  yamt sync with head.
 1.8.2.2 30-Dec-2006  yamt sync with head.
 1.8.2.1 21-Jun-2006  yamt sync with head.
 1.9.8.1 03-Sep-2006  yamt sync with head.
 1.9.4.1 09-Sep-2006  rpaulo sync with head
 1.10.4.1 22-Oct-2006  yamt sync with head
 1.10.2.1 18-Nov-2006  ad Sync with head.
 1.11.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.13.4.1 11-Jul-2007  mjf Sync with head.
 1.13.2.1 15-Jul-2007  ad Sync with head.
 1.14.30.1 18-May-2008  yamt sync with head.
 1.14.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.16.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.16.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.16.2.2 11-Mar-2010  yamt sync with head
 1.16.2.1 04-May-2009  yamt sync with head.
 1.18.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.19.4.1 06-Jun-2011  jruoho Sync with HEAD.
 1.19.2.1 31-May-2011  rmind sync with head
 1.21.26.1 10-Aug-2014  tls Rebase.
 1.21.16.2 18-May-2014  rmind sync with head
 1.21.16.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.21.12.2 03-Dec-2017  jdolecek update from HEAD
 1.21.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.43.4.5 28-Aug-2017  skrll Sync with HEAD
 1.43.4.4 09-Jul-2016  skrll Sync with HEAD
 1.43.4.3 19-Mar-2016  skrll Sync with HEAD
 1.43.4.2 06-Jun-2015  skrll Sync with HEAD
 1.43.4.1 06-Apr-2015  skrll Sync with HEAD
 1.50.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.50.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.53.2.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.53.2.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.58.2.4 15-Jul-2019  martin Pull up following revision(s) (requested by maxv in ticket #1287):

sys/netipsec/keysock.c: revision 1.69

Fix locking: it is fine if the lock is already key_so_mtx, this can happen
in socketpair. In that case don't take it.

Ok ozaki-r@
 1.58.2.3 12-Nov-2018  martin Pull up following revision(s) (requested by roy in ticket #1092):

sys/netipsec/keysock.c: revision 1.66
sys/kern/uipc_usrreq.c: revision 1.187

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.
 1.58.2.2 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.58.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.62.2.5 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.62.2.4 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.62.2.3 02-May-2018  pgoyette Synch with HEAD
 1.62.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.62.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.65.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.65.2.1 10-Jun-2019  christos Sync with HEAD
 1.71.2.1 02-Aug-2025  perseant Sync with HEAD
 1.13 13-Feb-2022  andvar fix few typos in comments and log message.
 1.12 24-Dec-2018  maxv Remove unused function.
 1.11 19-Apr-2018  maxv branches: 1.11.2;
Remove extra long file paths from the headers.
 1.10 27-Jul-2017  ozaki-r branches: 1.10.2;
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
 1.9 02-Jun-2017  ozaki-r branches: 1.9.2;
Tweak header file inclusions
 1.8 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.7 18-May-2014  rmind branches: 1.7.4;
Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.6 23-Apr-2008  thorpej branches: 1.6.46; 1.6.52; 1.6.62;
PF_KEY stats for IPSEC and FAST_IPSEC are now per-CPU.
 1.5 07-Jul-2007  degroote branches: 1.5.28; 1.5.30;
Ansify
Remove useless extern
bzero -> memset, bcopy -> memcpy

No functionnal changes
 1.4 11-Dec-2005  christos branches: 1.4.30; 1.4.32;
merge ktrace-lwp.
 1.3 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 04-Dec-2003  atatat branches: 1.2.4; 1.2.18;
Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.2.18.2 03-Sep-2007  yamt sync with head.
 1.2.18.1 21-Jun-2006  yamt sync with head.
 1.2.4.6 11-Dec-2005  christos Sync with head.
 1.2.4.5 24-Jan-2005  skrll Adapt to branch.
 1.2.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.4.2 03-Aug-2004  skrll Sync with HEAD
 1.2.4.1 04-Dec-2003  skrll file keysock.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.4.32.1 11-Jul-2007  mjf Sync with head.
 1.4.30.1 15-Jul-2007  ad Sync with head.
 1.5.30.1 18-May-2008  yamt sync with head.
 1.5.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.6.62.1 10-Aug-2014  tls Rebase.
 1.6.52.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.6.46.2 03-Dec-2017  jdolecek update from HEAD
 1.6.46.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.4.2 28-Aug-2017  skrll Sync with HEAD
 1.7.4.1 19-Mar-2016  skrll Sync with HEAD
 1.9.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.10.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.10.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.11.2.1 10-Jun-2019  christos Sync with HEAD
 1.22 22-May-2022  riastradh netipsec: Nothing uses xf_zeroize return value. Nix it.
 1.21 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.20 30-May-2018  maxv branches: 1.20.2;
Introduce ah_authsiz, which computes the length of the ICV only. Use it in
esp_hdrsiz, and clarify.

Until now we were using ah_hdrsiz, and were relying on the fact that the
size of the AH header happens to be equal to that of the ESP trailer.

Now the size of the ESP trailer is added manually. This also fixes one
branch in esp_hdrsiz: we always append an ESP trailer, so it must always
be taken into account, and not just when an ICV is here.
 1.19 07-May-2018  maxv Remove now unused 'isr', 'skip' and 'protoff' arguments from ipip_output.
 1.18 07-May-2018  maxv Remove unused 'mp' argument from all the xf_output functions. Also clean
up xform.h a bit.
 1.17 07-May-2018  maxv Clarify IPIP: ipe4_xformsw is not allowed to call ipip_output, so replace
the pointer by ipe4_output, which just panics. Group the ipe4_* functions
together. Localify other functions.

ok ozaki-r@
 1.16 01-May-2018  maxv Remove unused.
 1.15 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.14 16-Feb-2018  maxv branches: 1.14.2;
Style, remove unused and misleading macros and comments, localify, and
reduce the diff between similar functions. No functional change.
 1.13 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.12 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.11 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.10 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.9 05-Jul-2017  ozaki-r Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
 1.8 26-Jan-2016  knakahara branches: 1.8.10;
eliminate variable argument in encapsw
 1.7 25-Feb-2011  drochner branches: 1.7.14; 1.7.32;
make the use of SHA2-HMAC by FAST_IPSEC compliant to current standards:
-RFC2104 says that the block size of the hash algorithm must be used
for key/ipad/opad calculations. While formerly all ciphers used a block
length of 64, SHA384 and SHA512 use 128 bytes. So we can't use the
HMAC_BLOCK_LEN constant anymore. Add a new field to "struct auth_hash"
for the per-cipher blocksize.
-Due to this, there can't be a single "CRYPTO_SHA2_HMAC" external name
anymore. Replace this by 3 for the 3 different keysizes.
This was done by Open/FreeBSD before.
-Also fix the number of authenticator bits used tor ESP and AH to
conform to RFC4868, and remove uses of AH_HMAC_HASHLEN which did
assume a fixed authenticator size of 12 bytes.

FAST_IPSEC will not interoperate with KAME IPSEC anymore if sha2 is used,
because the latter doesn't implement these standards. It should
interoperate with at least modern Free/OpenBSD now.
(I've only tested with NetBSD-current/FAST_IPSEC on both ends.)
 1.6 18-Feb-2011  drochner more "const"
 1.5 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.4 04-Mar-2007  christos branches: 1.4.64; 1.4.70; 1.4.72;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.3 10-Dec-2005  elad branches: 1.3.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 10-Jun-2005  christos branches: 1.2.2;
constify and unshadow.
 1.1 13-Aug-2003  jonathan branches: 1.1.4;
Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.1.4.6 11-Dec-2005  christos Sync with head.
 1.1.4.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 13-Aug-2003  skrll file xform.h was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.2.2.2 03-Sep-2007  yamt sync with head.
 1.2.2.1 21-Jun-2006  yamt sync with head.
 1.3.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.4.72.1 05-Mar-2011  bouyer Sync with HEAD
 1.4.70.1 06-Jun-2011  jruoho Sync with HEAD.
 1.4.64.1 05-Mar-2011  rmind sync with head
 1.7.32.2 28-Aug-2017  skrll Sync with HEAD
 1.7.32.1 19-Mar-2016  skrll Sync with HEAD
 1.7.14.1 03-Dec-2017  jdolecek update from HEAD
 1.8.10.2 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.8.10.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.14.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.14.2.3 21-May-2018  pgoyette Sync with HEAD
 1.14.2.2 02-May-2018  pgoyette Synch with HEAD
 1.14.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.20.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.115 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.114 22-May-2022  riastradh branches: 1.114.10;
opencrypto: crypto_dispatch never fails now. Make it return void.

Same with crypto_kdispatch.
 1.113 22-May-2022  riastradh opencrypto: Rip out EAGAIN logic when unregistering crypto drivers.

I'm pretty sure this never worked reliably based on code inspection,
and it's unlikely to have ever been tested because it only applies
when unregistering a driver -- but we have no crypto drivers for
removable devices, so it would only apply if we went out of our way
to trigger detach with drvctl.

Instead, just make the operation fail with ENODEV, and remove all the
callback logic to resubmit the request on EAGAIN. (Maybe this should
be ENXIO, but crypto_kdispatch already does ENODEV.)
 1.112 22-May-2022  riastradh opencrypto: Make crypto_freesession return void.

No callers use the return value. It is not sensible to allow this to
fail.
 1.111 22-May-2022  riastradh netipsec: Nothing uses xf_zeroize return value. Nix it.
 1.110 22-May-2022  riastradh opencrypto: Make crp_callback, krp_callback return void.

Nothing uses the return values inside opencrypto, so let's stop
making users return them.
 1.109 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.108 12-Jun-2019  christos make DPRINTF use varyadic cpp macros, and merge with IPSECLOG.
 1.107 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.106 31-May-2018  maxv branches: 1.106.2;
Constify ipseczeroes, and remove one use of it.
 1.105 30-May-2018  maxv Correctly handle the padding for IPv6-AH, as specified by RFC4302. Seen in
a FreeBSD bug report, by Jason Mader.

The RFC specifies that under IPv6 the complete AH header must be 64bit-
aligned, and under IPv4 32bit-aligned. That's a rule we've never respected.
The other BSDs and MacOS never have either.

So respect it now.

This makes it possible to set up IPv6-AH between Linux and NetBSD, and also
probably between Windows and NetBSD.

Until now all the tests I made were between two *BSD hosts, and everything
worked "correctly" since both hosts were speaking the same non-standard
AHv6, so they could understand each other.

Tested with Fedora<->NetBSD, hmac-sha2-384.
 1.104 30-May-2018  maxv Introduce ah_authsiz, which computes the length of the ICV only. Use it in
esp_hdrsiz, and clarify.

Until now we were using ah_hdrsiz, and were relying on the fact that the
size of the AH header happens to be equal to that of the ESP trailer.

Now the size of the ESP trailer is added manually. This also fixes one
branch in esp_hdrsiz: we always append an ESP trailer, so it must always
be taken into account, and not just when an ICV is here.
 1.103 29-May-2018  maxv Strengthen and simplify, once more.
 1.102 29-May-2018  ozaki-r Fix non-INET6 builds
 1.101 18-May-2018  maxv IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.100 13-May-2018  maxv Remove unused calls to nat_t_ports_get.
 1.99 11-May-2018  maxv ENOBUFS -> EACCES when updating the replay counter.
 1.98 07-May-2018  maxv Remove unused 'mp' argument from all the xf_output functions. Also clean
up xform.h a bit.
 1.97 01-May-2018  maxv When IP6_EXTHDR_GET fails, return ENOBUFS, and don't log an error (HDROPS
is not supposed to be used here).
 1.96 01-May-2018  maxv When the replay check fails, return EACCES instead of ENOBUFS.
 1.95 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.94 24-Apr-2018  maxv Remove the M_AUTHIPDGM flag. It is equivalent to M_AUTHIPHDR, both
are set in IPsec-AH, and they are always handled together.
 1.93 23-Apr-2018  maxv Remove the kernel RH0 code. RH0 is deprecated by RFC5095, for security
reasons. RH0 was already removed in the kernel's input path, but some
parts were still present in the output path: they are now removed.

Sent on tech-net@ a few days ago.
 1.92 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.91 19-Apr-2018  maxv cosmetic
 1.90 18-Apr-2018  maxv Simplify the IPv4 parser. Get the option length in 'optlen', and sanitize
it earlier. A new check is added (off + optlen > skip).

In the IPv6 parser we reuse 'optlen', and remove 'ad' as a result.
 1.89 16-Apr-2018  maxv Remove dead code.

ok ozaki-r@
 1.88 13-Apr-2018  maxv Remove duplicate, to better show that this place doesn't make a lot of
sense. The code should probably be removed, it's a leftover from when we
had #ifdef __FreeBSD__.
 1.87 26-Feb-2018  maxv branches: 1.87.2;
Reinforce this area, make sure the length field fits the option. Normally
it always does because the options were already sanitized earlier.
 1.86 16-Feb-2018  maxv Add [ah/esp/ipcomp]_enable sysctls, and remove the FreeBSD #ifdefs.
Discussed with ozaki-r@.
 1.85 16-Feb-2018  maxv Remove some more FreeBSD sysctl declarations that already have NetBSD
counterparts. Discussed with ozaki-r@.
 1.84 15-Feb-2018  ozaki-r Fix buffer overflow on sending an IPv6 packet with large options

If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.

Pointed out by maxv@
 1.83 15-Feb-2018  ozaki-r Commonalize error paths (NFC)
 1.82 15-Feb-2018  maxv style
 1.81 15-Feb-2018  maxv Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.80 15-Feb-2018  maxv Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.
 1.79 15-Feb-2018  ozaki-r Fix kernel panic (assertion failure) on receiving an IPv6 packet with large options

If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.

Pointed out by maxv@
 1.78 15-Feb-2018  ozaki-r Don't relook up an SP/SA in opencrpyto callbacks

We don't need to do so because we have a reference to it. And also
relooking-up one there may return an sp/sav that has different
parameters from an original one.
 1.77 24-Jan-2018  maxv Reinforce and clarify.
 1.76 24-Jan-2018  maxv Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.

In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.

If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).

Add the missing '+2', this place will be reinforced in separate commits.
 1.75 24-Jan-2018  maxv Revert a part of rev1.49 (six months ago). The pointer given to memcpy
was correct.

Discussed with Christos and Ryota.
 1.74 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.73 10-Aug-2017  ozaki-r Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
 1.72 09-Aug-2017  ozaki-r MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
 1.71 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.70 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.69 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.68 20-Jul-2017  ozaki-r Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
 1.67 20-Jul-2017  ozaki-r Dedup error paths (NFC)
 1.66 20-Jul-2017  ozaki-r Fix a debug message
 1.65 19-Jul-2017  ozaki-r Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
 1.64 19-Jul-2017  ozaki-r Don't bother the case of crp->crp_buf == NULL in callbacks
 1.63 19-Jul-2017  ozaki-r Don't release sav if calling crypto_dispatch again
 1.62 18-Jul-2017  ozaki-r branches: 1.62.2;
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
 1.61 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.60 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.59 13-Jul-2017  ozaki-r Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
 1.58 10-Jul-2017  ozaki-r Use explicit_memset to surely zero-clear key_auth and key_enc
 1.57 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.56 05-Jul-2017  ozaki-r Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
 1.55 29-Jun-2017  ozaki-r Apply C99-style struct initialization to xformsw
 1.54 11-May-2017  ryo branches: 1.54.2;
Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.53 19-Apr-2017  ozaki-r branches: 1.53.2;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.52 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.51 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.50 15-Apr-2017  christos cosmetic fixes:
- __func__ in printfs
- no space after sizeof
- eliminate useless casts
- u_intX_t -> uintX_t
 1.49 14-Apr-2017  christos - fix old refactoring which zeroed the wrong part of the buffer.
- simplify.
 1.48 14-Apr-2017  christos change into __func__
 1.47 13-Apr-2017  christos Redo the statistics through an indirection array and put the definitions
of the arrays in pfkeyv2.h so that they are next to the index definitions.
Remove "bogus" comment about compressing the statistics which is now fixed.
 1.46 13-Apr-2017  ozaki-r Fix that ah_algorithm_lookup and esp_algorithm_lookup don't handle some algorithms

Unrelated upper limit values, AH_ALG_MAX and ESP_ALG_MAX, prevented some
algorithms from being looked up.
 1.45 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.44 30-Mar-2015  ozaki-r branches: 1.44.2; 1.44.4;
Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.43 27-Mar-2015  ozaki-r KNF
 1.42 03-Nov-2013  mrg branches: 1.42.4; 1.42.6; 1.42.8; 1.42.12;
- apply some __diagused
- remove unused variables
- move some variables inside their relevant use #ifdef
 1.41 28-Aug-2013  riastradh Fix sense of consttime_memequal and update all callers.

Now it returns true (nonzero) to mean equal and false (zero) to mean
inequal, as the name suggests.

As promised on tech-userlevel back in June:

https://mail-index.netbsd.org/tech-userlevel/2013/06/24/msg007843.html
 1.40 24-Jun-2013  riastradh branches: 1.40.2;
Replace consttime_bcmp/explicit_bzero by consttime_memequal/explicit_memset.

consttime_memequal is the same as the old consttime_bcmp.
explicit_memset is to memset as explicit_bzero was to bcmp.

Passes amd64 release and i386/ALL, but I'm sure I missed some spots,
so please let me know.
 1.39 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.38 30-Aug-2012  drochner branches: 1.38.2;
Add "consttime_bcmp" and "explicit_bzero" functions for both kernel
abd userland, as proposed on tech-security, with explicit_bzero using
a volatile function pointer as suggested by Alan Barrett.
Both do what the name says. For userland, both are prefixed by "__"
to keep them out of the user namespace.
Change some memset/memcmp uses to the new functions where it makes
sense -- these are just some examples, more to come.
 1.37 26-Jan-2012  drochner branches: 1.37.2; 1.37.6; 1.37.8;
remove some DPRINTFs which are not just diagnostics but cause noise
even on regular operation
 1.36 25-Jan-2012  drochner Make sure the mbufs in the input path (only the parts which we are going
to modify in the AH case) are writable/non-shared.
This addresses PR kern/33162 by Jeff Rizzo, and replaces the insufficient
patch from that time by a radical solution.
(The PR's problem had been worked around by rev.1.3 of xennetback_xenbus.c,
so it needs a network driver modification to reproduce it.)
Being here, clarify a bit of ipcomp -- uncompression is done in-place,
the header must be removed explicitly.
 1.35 24-Jan-2012  drochner fix pointer/offset mistakes in handling of IPv4 options
 1.34 10-Jan-2012  drochner add patch from Arnaud Degroote to handle IPv6 extended options with
(FAST_)IPSEC, tested lightly with a DSTOPTS header consisting
of PAD1
 1.33 24-May-2011  drochner branches: 1.33.4; 1.33.8;
copy AES-XCBC-MAC support from KAME IPSEC to FAST_IPSEC
For this to fit, an API change in cryptosoft was adopted from OpenBSD
(addition of a "Setkey" method to hashes) which was done for GCM/GMAC
support there, so it might be useful in the future anyway.
tested against KAME IPSEC
AFAICT, FAST_IPSEC now supports as much as KAME.
 1.32 06-May-2011  drochner As a first step towards more fine-grained locking, don't require
crypto_{new.free}session() to be called with the "crypto_mtx"
spinlock held.
This doesn't change much for now because these functions acquire
the said mutex first on entry now, but at least it keeps the nasty
locks local to the opencrypto core.
 1.31 18-Feb-2011  drochner more "const"
 1.30 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.29 16-Feb-2011  drochner remove some unnecessary pointer typecasts
(one was wrong on BE systems, but was harmless here because the
result is effectively unused)
 1.28 14-Feb-2011  drochner change locking order, to make sure the cpu is at splsoftnet()
before the softnet_lock (adaptive) mutex is acquired, from
Wolfgang Stukenbrock, should fix a recursive lock panic
 1.27 10-Feb-2011  drochner -in opencrypto callbacks (which run in a kernel thread), pull softnet_lock
everywhere splsoftnet() was used before, to fix MP concurrency problems
-pull KERNEL_LOCK where ip(6)_output() is called, as this is what
the network stack (unfortunately) expects, in particular to avoid
races for packets in the interface send queues
From Wolfgang Stukenbrock per PR kern/44418, with the application
of KERNEL_LOCK to what I think are the essential points, tested
on a dual-core i386.
 1.26 18-Apr-2009  tsutsui branches: 1.26.4; 1.26.6; 1.26.8;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.25 18-Mar-2009  cegger bcopy -> memcpy
 1.24 18-Mar-2009  cegger bzero -> memset
 1.23 18-Mar-2009  cegger bcmp -> memcmp
 1.22 17-Dec-2008  cegger branches: 1.22.2;
kill MALLOC and FREE macros.
 1.21 23-Apr-2008  thorpej branches: 1.21.2; 1.21.10;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.20 04-Feb-2008  tls branches: 1.20.6; 1.20.8;
Rework opencrypto to use a spin mutex (crypto_mtx) instead of "splcrypto"
(actually splnet) and condvars instead of tsleep/wakeup. Fix a few
miscellaneous problems and add some debugging printfs while there.

Restore set of CRYPTO_F_DONE in crypto_done() which was lost at some
point after this code came from FreeBSD -- it made it impossible to wait
properly for a condition.

Add flags analogous to the "crp" flags to the key operation's krp struct.
Add a new flag, CRYPTO_F_ONRETQ which tells us a request finished before
the kthread had a chance to dequeue it and call its callback -- this was
letting requests stick on the queues before even though done and copied
out.

Callers of crypto_newsession() or crypto_freesession() must now take the
mutex. Change netipsec to do so. Dispatch takes the mutex itself as
needed.

This was tested fairly extensively with the cryptosoft backend and lightly
with a new hardware driver. It has not been tested with FAST_IPSEC; I am
unable to ascertain whether FAST_IPSEC currently works at all in our tree.

pjd@FreeBSD.ORG, ad@NetBSD.ORG, and darran@snark.us pointed me in the
right direction several times in the course of this. Remaining bugs
are mine alone.
 1.19 28-Oct-2007  adrianp branches: 1.19.2;
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.

Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.18 27-Jun-2007  degroote branches: 1.18.6; 1.18.8; 1.18.12;
Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.17 25-Mar-2007  degroote Honor the ip4_ah_offsetmask bits (clear or not the ip->ip_off field for ah
processing).
 1.16 25-Mar-2007  degroote Use ip4_ah_cleartos instead of ah_cleartos for consistency
 1.15 04-Mar-2007  degroote branches: 1.15.2; 1.15.4; 1.15.6;
Remove useless cast
Use NULL instead of (void*) 0
 1.14 04-Mar-2007  degroote Fix fallout from caddr_t changes
 1.13 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.12 16-Nov-2006  christos branches: 1.12.2; 1.12.4; 1.12.8;
__unused removal on arguments; approved by core.
 1.11 13-Oct-2006  christos more __unused
 1.10 11-Apr-2006  rpaulo branches: 1.10.8; 1.10.10;
Add two new sysctls protected under IPSEC_DEBUG:

net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.

net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.

(a message will be printed indicating when these sysctls changed)

By Pawel Jakub Dawidek <pjd@FreeBSD.org>.
Discussed with Christos Zoulas and Jonathan Stone.
 1.9 11-Dec-2005  christos branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12;
merge ktrace-lwp.
 1.8 26-Feb-2005  perry branches: 1.8.2; 1.8.4; 1.8.12; 1.8.14;
nuke trailing whitespace
 1.7 01-May-2004  jonathan branches: 1.7.2; 1.7.6; 1.7.8;
Commit an old diff for AH which has been in my personal tree since
August 2003:

On NetBSD, when we get to ah_massage_headers(), ip->ip_len is in
network byte order and includes all bytes in the input packet.
Therefore we don't need to byte-swap it or to add `skip' back in,
before verifying the receive-side hash.

With this change, AH transport mode works against FreeBSD 4.9 fast-ipsec
(which also works against Win2k, &c., &c.).
 1.6 17-Mar-2004  jonathan branches: 1.6.2;
sys/netinet6/ip6_ecn.h is reportedly a FreeBSD-ism; NetBSD has
prototypes for the IPv6 ECN ingress/egress functions in sys/netinet/ip_ecn.h,
inside an #ifdef INET6 wrapper. So, wrap sys/netipsec ocurrences of
#include <netinet6/ip6_ecn.h>
in #ifdef __FreeBSD__/#endif, until both camps can agree on this
teensy little piece of namespace. Affects:
ipsec_output.c xform_ah.c xform_esp.c xform_ipip.c
 1.5 12-Dec-2003  thorpej Cast an expression with sizeof() to long.
 1.4 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.3 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.2 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.6.2.2 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.6.2.1 11-May-2004  tron branches: 1.6.2.1.2; 1.6.2.1.4;
Pull up revision 1.7 (requested by jonathan in ticket #283):
Commit an old diff for AH which has been in my personal tree since
August 2003:
On NetBSD, when we get to ah_massage_headers(), ip->ip_len is in
network byte order and includes all bytes in the input packet.
Therefore we don't need to byte-swap it or to add `skip' back in,
before verifying the receive-side hash.
With this change, AH transport mode works against FreeBSD 4.9 fast-ipsec
(which also works against Win2k, &c., &c.).
 1.6.2.1.4.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.6.2.1.2.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.7.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.7.6.1 29-Apr-2005  kent sync with -current
 1.7.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.7.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.7.2.2 03-Aug-2004  skrll Sync with HEAD
 1.7.2.1 01-May-2004  skrll file xform_ah.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.8.14.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.8.12.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.8.4.5 04-Feb-2008  yamt sync with head.
 1.8.4.4 15-Nov-2007  yamt sync with head.
 1.8.4.3 03-Sep-2007  yamt sync with head.
 1.8.4.2 30-Dec-2006  yamt sync with head.
 1.8.4.1 21-Jun-2006  yamt sync with head.
 1.8.2.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.9.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.9.10.1 19-Apr-2006  elad sync with head.
 1.9.8.1 24-May-2006  yamt sync with head.
 1.9.6.1 22-Apr-2006  simonb Sync with head.
 1.9.4.1 09-Sep-2006  rpaulo sync with head
 1.10.10.2 10-Dec-2006  yamt sync with head.
 1.10.10.1 22-Oct-2006  yamt sync with head
 1.10.8.1 18-Nov-2006  ad Sync with head.
 1.12.8.1 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.12.4.2 15-Apr-2007  yamt sync with head.
 1.12.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.12.2.1 31-Oct-2007  liamjfoy Pull up following revision(s) (requested by adrianp in ticket #964):
sys/netipsec/xform_ah.c: revision 1.19
sys/netipsec/ipsec.c: revision 1.34
sys/netipsec/xform_ipip.c: revision 1.18
sys/netipsec/ipsec_output.c: revision 1.23
sys/netipsec/ipsec_osdep.h: revision 1.21
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote&#64;
 1.15.6.1 29-Mar-2007  reinoud Pullup to -current
 1.15.4.1 11-Jul-2007  mjf Sync with head.
 1.15.2.2 15-Jul-2007  ad Sync with head.
 1.15.2.1 10-Apr-2007  ad Sync with head.
 1.18.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.18.8.2 23-Mar-2008  matt sync with HEAD
 1.18.8.1 06-Nov-2007  matt sync with HEAD
 1.18.6.1 28-Oct-2007  joerg Sync with HEAD.
 1.19.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.20.8.1 18-May-2008  yamt sync with head.
 1.20.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.20.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.21.10.2 28-Apr-2009  skrll Sync with HEAD.
 1.21.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.21.2.1 04-May-2009  yamt sync with head.
 1.22.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.26.8.2 05-Mar-2011  bouyer Sync with HEAD
 1.26.8.1 17-Feb-2011  bouyer Sync with HEAD
 1.26.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.26.4.2 31-May-2011  rmind sync with head
 1.26.4.1 05-Mar-2011  rmind sync with head
 1.33.8.1 18-Feb-2012  mrg merge to -current.
 1.33.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.33.4.2 30-Oct-2012  yamt sync with head
 1.33.4.1 17-Apr-2012  yamt sync with head
 1.37.8.4 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.37.8.3 15-Feb-2018  martin Fix previous (Ticket #1530)
 1.37.8.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1530):
sys/netipsec/xform_ah.c: revision 1.80-1.81 via patch

Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.37.8.1 29-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1521):
sys/netipsec/xform_ah.c: revision 1.76
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.
In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.
If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).
Add the missing '+2', this place will be reinforced in separate commits.
 1.37.6.4 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.37.6.3 15-Feb-2018  martin Fix previous (Ticket #1530)
 1.37.6.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1530):
sys/netipsec/xform_ah.c: revision 1.80-1.81 via patch

Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.37.6.1 29-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1521):
sys/netipsec/xform_ah.c: revision 1.76
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.
In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.
If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).
Add the missing '+2', this place will be reinforced in separate commits.
 1.37.2.4 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.37.2.3 15-Feb-2018  martin Fix previous (Ticket #1530)
 1.37.2.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1530):
sys/netipsec/xform_ah.c: revision 1.80-1.81 via patch

Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.37.2.1 29-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1521):
sys/netipsec/xform_ah.c: revision 1.76
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.
In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.
If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).
Add the missing '+2', this place will be reinforced in separate commits.
 1.38.2.3 03-Dec-2017  jdolecek update from HEAD
 1.38.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.38.2.1 23-Jun-2013  tls resync from head
 1.40.2.2 18-May-2014  rmind sync with head
 1.40.2.1 28-Aug-2013  rmind sync with head
 1.42.12.3 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.42.12.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1568):
sys/netipsec/xform_ah.c: revision 1.80-1.81 via patch

Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.42.12.1 29-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1557):
sys/netipsec/xform_ah.c: revision 1.76
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.
In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.
If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).
Add the missing '+2', this place will be reinforced in separate commits.
 1.42.8.3 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.42.8.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1568):
sys/netipsec/xform_ah.c: revision 1.80-1.81 via patch

Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.42.8.1 29-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1557):
sys/netipsec/xform_ah.c: revision 1.76
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.
In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.
If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).
Add the missing '+2', this place will be reinforced in separate commits.
 1.42.6.2 28-Aug-2017  skrll Sync with HEAD
 1.42.6.1 06-Apr-2015  skrll Sync with HEAD
 1.42.4.3 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.42.4.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1568):
sys/netipsec/xform_ah.c: revision 1.80-1.81 via patch

Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.42.4.1 29-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1557):
sys/netipsec/xform_ah.c: revision 1.76
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.
In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.
If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).
Add the missing '+2', this place will be reinforced in separate commits.
 1.44.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.44.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.53.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.54.2.7 22-Jun-2018  martin Pull up following revision(s) (requested by maxv in ticket #889):

sys/netinet6/ip6_output.c: revision 1.205
sys/netipsec/xform_ah.c: revision 1.90,1.93,1.102,1.103

Simplify the IPv4 parser. Get the option length in 'optlen', and sanitize
it earlier. A new check is added (off + optlen > skip).

In the IPv6 parser we reuse 'optlen', and remove 'ad' as a result.

Remove the kernel RH0 code. RH0 is deprecated by RFC5095, for security
reasons. RH0 was already removed in the kernel's input path, but some
parts were still present in the output path: they are now removed.
Sent on tech-net@ a few days ago.

Fix non-INET6 builds

Strengthen and simplify, once more.
 1.54.2.6 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #680):

sys/netipsec/xform_ah.c: revision 1.87
sys/netipsec/xform_ah.c: revision 1.77

Reinforce and clarify.

Reinforce this area, make sure the length field fits the option. Normally
it always does because the options were already sanitized earlier.
 1.54.2.5 26-Feb-2018  martin Pull up rev 1.78 of sys/netipsec/xform_ah.c for real, requested by
ozaki-r in ticket #587 (and already claimed to be part of previous
commit)
 1.54.2.4 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #587):
sys/netipsec/xform_ipcomp.c: revision 1.54-1.56
sys/netipsec/xform_ah.c: revision 1.78,1.79(patch),1.82-1.84
sys/netipsec/xform_esp.c: revision 1.74-1.76

Fix mbuf leaks on error paths

Dedup common codes in error paths (NFCI)

Don't relook up an SP/SA in opencrpyto callbacks
We don't need to do so because we have a reference to it. And also
relooking-up one there may return an sp/sav that has different
parameters from an original one.

Fix kernel panic (assertion failure) on receiving an IPv6 packet with large options
If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.

Style.

Commonalize error paths (NFC)

Fix buffer overflow on sending an IPv6 packet with large options
If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.
Pointed out by maxv@
 1.54.2.3 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #549):
sys/netipsec/xform_ah.c: revision 1.80-1.81 via patch

Fix use-after-free, 'ah' may not be valid after m_makewritable and
ah_massage_headers.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.54.2.2 26-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #512):
sys/netipsec/xform_ah.c: revision 1.75
sys/netipsec/xform_ah.c: revision 1.76
Revert a part of rev1.49 (six months ago). The pointer given to memcpy
was correct.
Discussed with Christos and Ryota.
Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely
crash the kernel with a single packet.
In this loop we need to increment 'ad' by two, because the length field
of the option header does not count the size of the option header itself.
If the length is zero, then 'count' is incremented by zero, and there's
an infinite loop. Beyond that, this code was written with the assumption
that since the IPv6 packet already went through the generic IPv6 option
parser, several fields are guaranteed to be valid; but this assumption
does not hold because of the missing '+2', and there's as a result a
triggerable buffer overflow (write zeros after the end of the mbuf,
potentially to the next mbuf in memory since it's a pool).
Add the missing '+2', this place will be reinforced in separate commits.
 1.54.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.62.2.2 18-Jul-2017  ozaki-r 3197901
 1.62.2.1 18-Jul-2017  ozaki-r file xform_ah.c was added on branch perseant-stdc-iso10646 on 2017-07-18 04:01:05 +0000
 1.87.2.5 25-Jun-2018  pgoyette Sync with HEAD
 1.87.2.4 21-May-2018  pgoyette Sync with HEAD
 1.87.2.3 02-May-2018  pgoyette Synch with HEAD
 1.87.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.87.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.106.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.106.2.1 10-Jun-2019  christos Sync with HEAD
 1.114.10.1 02-Aug-2025  perseant Sync with HEAD
 1.107 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.106 25-May-2022  ozaki-r branches: 1.106.10;
ipsec: don't assert for the format of incoming packets
 1.105 22-May-2022  riastradh opencrypto: crypto_dispatch never fails now. Make it return void.

Same with crypto_kdispatch.
 1.104 22-May-2022  riastradh opencrypto: Rip out EAGAIN logic when unregistering crypto drivers.

I'm pretty sure this never worked reliably based on code inspection,
and it's unlikely to have ever been tested because it only applies
when unregistering a driver -- but we have no crypto drivers for
removable devices, so it would only apply if we went out of our way
to trigger detach with drvctl.

Instead, just make the operation fail with ENODEV, and remove all the
callback logic to resubmit the request on EAGAIN. (Maybe this should
be ENXIO, but crypto_kdispatch already does ENODEV.)
 1.103 22-May-2022  riastradh netipsec: Nothing uses xf_zeroize return value. Nix it.
 1.102 22-May-2022  riastradh opencrypto: Make crp_callback, krp_callback return void.

Nothing uses the return values inside opencrypto, so let's stop
making users return them.
 1.101 05-Oct-2020  knakahara Make sequence number of esp header MP-safe for IPsec Tx side. reviewed by ozaki-r@n.o

In IPsec Tx side, one Security Association can be used by multiple CPUs.
On the other hand, in IPsec Rx side, one Security Association is used
by only one CPU.

XXX pullup-{8,9}
 1.100 30-Jun-2020  riastradh Rename enc_xform_rijndael128 -> enc_xform_aes.

Update netipsec dependency.
 1.99 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.98 12-Jun-2019  christos branches: 1.98.2;
make DPRINTF use varyadic cpp macros, and merge with IPSECLOG.
 1.97 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.96 31-May-2018  maxv branches: 1.96.2;
Add a comment and a KASSERT. I remember wondering whether this check was a
problem, since ARC4 has a blocksize of one. Normally ARC4 can't be used in
IPsec.
 1.95 31-May-2018  maxv style
 1.94 30-May-2018  maxv Introduce ah_authsiz, which computes the length of the ICV only. Use it in
esp_hdrsiz, and clarify.

Until now we were using ah_hdrsiz, and were relying on the fact that the
size of the AH header happens to be equal to that of the ESP trailer.

Now the size of the ESP trailer is added manually. This also fixes one
branch in esp_hdrsiz: we always append an ESP trailer, so it must always
be taken into account, and not just when an ICV is here.
 1.93 30-May-2018  maxv Apply the previous change in esp_input too, same as esp_output.
 1.92 30-May-2018  maxv Remove dead code, 'espx' is never NULL and dereferenced earlier, so no need
to NULL-check all the time.
 1.91 30-May-2018  maxv Simplify the padding computation. Until now 'padlen' contained the ESP
Trailer (two bytes), and we were doing minus two all the time.

Declare 'tlen', which contains padlen+ESP_Trailer+ICV, and use 'struct
esptail' instead of hardcoding the construction of the trailer. 'padlen'
now indicates only the length of the padding, so no need to do -2.
 1.90 30-May-2018  maxv Rename padding -> padlen, pad -> tail, and clarify.
 1.89 18-May-2018  maxv IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.88 13-May-2018  maxv Remove unused calls to nat_t_ports_get.
 1.87 11-May-2018  maxv ENOBUFS -> EACCES when updating the replay counter.
 1.86 07-May-2018  maxv Remove unused 'mp' argument from all the xf_output functions. Also clean
up xform.h a bit.
 1.85 01-May-2018  maxv When IP6_EXTHDR_GET fails, return ENOBUFS, and don't log an error (HDROPS
is not supposed to be used here).
 1.84 01-May-2018  maxv When the replay check fails, return EACCES instead of ENOBUFS.
 1.83 01-May-2018  maxv Remove double include, opencrypto/xform.h is already included in
netipsec/xform.h.
 1.82 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.81 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.80 19-Apr-2018  maxv Style, and remove meaningless XXX.
 1.79 16-Feb-2018  maxv branches: 1.79.2;
Add [ah/esp/ipcomp]_enable sysctls, and remove the FreeBSD #ifdefs.
Discussed with ozaki-r@.
 1.78 16-Feb-2018  maxv Remove some more FreeBSD sysctl declarations that already have NetBSD
counterparts. Discussed with ozaki-r@.
 1.77 15-Feb-2018  maxv Style a bit, and if we don't know the pad-filling policy use
SADB_X_EXT_PZERO by default.

There doesn't seem to be a sanity check in the keysock API to make sure
this place is never reached, and it's better to fill in with zeros than
not filling in at all (and leaking uninitialized mbuf data).
 1.76 15-Feb-2018  ozaki-r Don't relook up an SP/SA in opencrpyto callbacks

We don't need to do so because we have a reference to it. And also
relooking-up one there may return an sp/sav that has different
parameters from an original one.
 1.75 14-Feb-2018  ozaki-r Dedup common codes in error paths (NFCI)
 1.74 14-Feb-2018  ozaki-r Fix mbuf leaks on error paths

Pointed out by maxv@
 1.73 24-Jan-2018  maxv Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
 1.72 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.71 10-Aug-2017  ozaki-r Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
 1.70 09-Aug-2017  ozaki-r MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
 1.69 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.68 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.67 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.66 20-Jul-2017  ozaki-r Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
 1.65 19-Jul-2017  ozaki-r Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
 1.64 19-Jul-2017  ozaki-r Don't bother the case of crp->crp_buf == NULL in callbacks
 1.63 19-Jul-2017  ozaki-r Don't release sav if calling crypto_dispatch again
 1.62 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.61 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.60 13-Jul-2017  ozaki-r Fix header size calculation of esp where sav is NULL
 1.59 10-Jul-2017  ozaki-r Use explicit_memset to surely zero-clear key_auth and key_enc
 1.58 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.57 05-Jul-2017  ozaki-r Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
 1.56 29-Jun-2017  ozaki-r Apply C99-style struct initialization to xformsw
 1.55 11-May-2017  ryo branches: 1.55.2;
Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.54 19-Apr-2017  ozaki-r branches: 1.54.2;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.53 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.52 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.51 15-Apr-2017  christos cosmetic fixes:
- __func__ in printfs
- no space after sizeof
- eliminate useless casts
- u_intX_t -> uintX_t
 1.50 13-Apr-2017  christos Redo the statistics through an indirection array and put the definitions
of the arrays in pfkeyv2.h so that they are next to the index definitions.
Remove "bogus" comment about compressing the statistics which is now fixed.
 1.49 13-Apr-2017  ozaki-r Fix that ah_algorithm_lookup and esp_algorithm_lookup don't handle some algorithms

Unrelated upper limit values, AH_ALG_MAX and ESP_ALG_MAX, prevented some
algorithms from being looked up.
 1.48 10-Apr-2017  christos PR/52150: Ryota Ozaki: ipsec: kernel panic on adding a key with an invalid
length.
 1.47 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.46 30-Mar-2015  ozaki-r branches: 1.46.2; 1.46.4;
Tidy up opt_ipsec.h inclusions

Some inclusions of opt_ipsec.h were for IPSEC_NAT_T and are now unnecessary.
Add inclusions to some C files for IPSEC_DEBUG.
 1.45 03-Nov-2013  mrg branches: 1.45.4; 1.45.6; 1.45.8; 1.45.12;
- apply some __diagused
- remove unused variables
- move some variables inside their relevant use #ifdef
 1.44 28-Aug-2013  riastradh Fix sense of consttime_memequal and update all callers.

Now it returns true (nonzero) to mean equal and false (zero) to mean
inequal, as the name suggests.

As promised on tech-userlevel back in June:

https://mail-index.netbsd.org/tech-userlevel/2013/06/24/msg007843.html
 1.43 24-Jun-2013  riastradh branches: 1.43.2;
Replace consttime_bcmp/explicit_bzero by consttime_memequal/explicit_memset.

consttime_memequal is the same as the old consttime_bcmp.
explicit_memset is to memset as explicit_bzero was to bcmp.

Passes amd64 release and i386/ALL, but I'm sure I missed some spots,
so please let me know.
 1.42 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.41 30-Aug-2012  drochner branches: 1.41.2;
Add "consttime_bcmp" and "explicit_bzero" functions for both kernel
abd userland, as proposed on tech-security, with explicit_bzero using
a volatile function pointer as suggested by Alan Barrett.
Both do what the name says. For userland, both are prefixed by "__"
to keep them out of the user namespace.
Change some memset/memcmp uses to the new functions where it makes
sense -- these are just some examples, more to come.
 1.40 25-Jan-2012  drochner branches: 1.40.2; 1.40.6; 1.40.8;
Make sure the mbufs in the input path (only the parts which we are going
to modify in the AH case) are writable/non-shared.
This addresses PR kern/33162 by Jeff Rizzo, and replaces the insufficient
patch from that time by a radical solution.
(The PR's problem had been worked around by rev.1.3 of xennetback_xenbus.c,
so it needs a network driver modification to reproduce it.)
Being here, clarify a bit of ipcomp -- uncompression is done in-place,
the header must be removed explicitly.
 1.39 31-Aug-2011  plunky branches: 1.39.2; 1.39.6;
NULL does not need a cast
 1.38 26-May-2011  drochner pull in AES-GCM/GMAC support from OpenBSD
This is still somewhat experimental. Tested between 2 similar boxes
so far. There is much potential for performance improvement. For now,
I've changed the gmac code to accept any data alignment, as the "char *"
pointer suggests. As the code is practically used, 32-bit alignment
can be assumed, at the cost of data copies. I don't know whether
bytewise access or copies are worse performance-wise. For efficient
implementations using SSE2 instructions on x86, even stricter
alignment requirements might arise.
 1.37 23-May-2011  drochner g/c remainders of IV handling in pfkey code -- this is done in
opencrypto now
 1.36 23-May-2011  drochner allow ESP to use AES-CTR
(pfkey and userland tool support is already there because it has been
in KAME IPSEC all the time)
tested against KAME IPSEC
 1.35 23-May-2011  drochner -in the descriptor for encryption xforms, split the "blocksize" field
into "blocksize" and "IV size"
-add an "reinit" function pointer which, if set, means that the xform
does its IV handling itself and doesn't want the default CBC handling
by the framework (poor name, but left that way to avoid unecessary
differences)
This syncs with Open/FreeBSD, purpose is to allow non-CBC transforms.
Refer to ivsize instead of blocksize where appropriate.
(At this point, blocksize and ivsize are identical.)
 1.34 06-May-2011  drochner As a first step towards more fine-grained locking, don't require
crypto_{new.free}session() to be called with the "crypto_mtx"
spinlock held.
This doesn't change much for now because these functions acquire
the said mutex first on entry now, but at least it keeps the nasty
locks local to the opencrypto core.
 1.33 05-May-2011  drochner fix C&P botch in diagnostic printfs
 1.32 05-May-2011  drochner support camellia-cbc as ESP cipher
 1.31 27-Mar-2011  spz fix compiling with IPSEC_DEBUG:
it's authsize not authlen in struct auth_hash
 1.30 25-Feb-2011  drochner make the use of SHA2-HMAC by FAST_IPSEC compliant to current standards:
-RFC2104 says that the block size of the hash algorithm must be used
for key/ipad/opad calculations. While formerly all ciphers used a block
length of 64, SHA384 and SHA512 use 128 bytes. So we can't use the
HMAC_BLOCK_LEN constant anymore. Add a new field to "struct auth_hash"
for the per-cipher blocksize.
-Due to this, there can't be a single "CRYPTO_SHA2_HMAC" external name
anymore. Replace this by 3 for the 3 different keysizes.
This was done by Open/FreeBSD before.
-Also fix the number of authenticator bits used tor ESP and AH to
conform to RFC4868, and remove uses of AH_HMAC_HASHLEN which did
assume a fixed authenticator size of 12 bytes.

FAST_IPSEC will not interoperate with KAME IPSEC anymore if sha2 is used,
because the latter doesn't implement these standards. It should
interoperate with at least modern Free/OpenBSD now.
(I've only tested with NetBSD-current/FAST_IPSEC on both ends.)
 1.29 19-Feb-2011  degroote Fix a missing const in FAST_IPSEC && IPSEC_DEBUG
 1.28 18-Feb-2011  drochner more "const"
 1.27 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.26 14-Feb-2011  drochner one more botched statistics counter (could increment semi-random locations)
 1.25 14-Feb-2011  drochner fix output bytecount statcounter
 1.24 14-Feb-2011  drochner change locking order, to make sure the cpu is at splsoftnet()
before the softnet_lock (adaptive) mutex is acquired, from
Wolfgang Stukenbrock, should fix a recursive lock panic
 1.23 10-Feb-2011  drochner -in opencrypto callbacks (which run in a kernel thread), pull softnet_lock
everywhere splsoftnet() was used before, to fix MP concurrency problems
-pull KERNEL_LOCK where ip(6)_output() is called, as this is what
the network stack (unfortunately) expects, in particular to avoid
races for packets in the interface send queues
From Wolfgang Stukenbrock per PR kern/44418, with the application
of KERNEL_LOCK to what I think are the essential points, tested
on a dual-core i386.
 1.22 20-Mar-2009  cegger branches: 1.22.4; 1.22.6; 1.22.8;
Correct bungled bcopy() -> memcpy() conversion
 1.21 18-Mar-2009  cegger bcopy -> memcpy
 1.20 18-Mar-2009  cegger bzero -> memset
 1.19 18-Mar-2009  cegger bcmp -> memcmp
 1.18 23-Apr-2008  thorpej branches: 1.18.2; 1.18.10; 1.18.16;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.17 04-Feb-2008  tls branches: 1.17.6; 1.17.8;
Rework opencrypto to use a spin mutex (crypto_mtx) instead of "splcrypto"
(actually splnet) and condvars instead of tsleep/wakeup. Fix a few
miscellaneous problems and add some debugging printfs while there.

Restore set of CRYPTO_F_DONE in crypto_done() which was lost at some
point after this code came from FreeBSD -- it made it impossible to wait
properly for a condition.

Add flags analogous to the "crp" flags to the key operation's krp struct.
Add a new flag, CRYPTO_F_ONRETQ which tells us a request finished before
the kthread had a chance to dequeue it and call its callback -- this was
letting requests stick on the queues before even though done and copied
out.

Callers of crypto_newsession() or crypto_freesession() must now take the
mutex. Change netipsec to do so. Dispatch takes the mutex itself as
needed.

This was tested fairly extensively with the cryptosoft backend and lightly
with a new hardware driver. It has not been tested with FAST_IPSEC; I am
unable to ascertain whether FAST_IPSEC currently works at all in our tree.

pjd@FreeBSD.ORG, ad@NetBSD.ORG, and darran@snark.us pointed me in the
right direction several times in the course of this. Remaining bugs
are mine alone.
 1.16 27-Jun-2007  degroote branches: 1.16.8; 1.16.14;
Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.15 04-Mar-2007  degroote branches: 1.15.2; 1.15.4;
Remove useless cast
Use NULL instead of (void*) 0
 1.14 04-Mar-2007  degroote Fix fallout from caddr_t changes
 1.13 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.12 16-Nov-2006  christos branches: 1.12.4;
__unused removal on arguments; approved by core.
 1.11 13-Oct-2006  christos more __unused
 1.10 28-Apr-2006  pavel branches: 1.10.8; 1.10.10;
The esp_input_cb function used m_copyback, which fails if the mbuf is
read-only. This can actually happen if the packet was received by the
xennet interface, see PR kern/33162. Change it to m_copyback_cow.

AH and IPCOMP probably need similar fixes.

Requested by Jeff Rizzo, tested on Xen with -current by him.
 1.9 11-Apr-2006  rpaulo Add two new sysctls protected under IPSEC_DEBUG:

net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.

net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.

(a message will be printed indicating when these sysctls changed)

By Pawel Jakub Dawidek <pjd@FreeBSD.org>.
Discussed with Christos Zoulas and Jonathan Stone.
 1.8 23-Mar-2006  rpaulo FreeBSD SA-06:11 and CVE-2006-0905: update the replay sequence number
or else the anti-reply technique won't work as expected.
 1.7 11-Dec-2005  christos branches: 1.7.4; 1.7.6; 1.7.8; 1.7.10; 1.7.12;
merge ktrace-lwp.
 1.6 27-May-2005  seanb branches: 1.6.2;
- Discrepency between malloc / free types with init vector (see free
in netipsec/key.c).
- Reviewed by christos.
 1.5 17-Mar-2004  jonathan branches: 1.5.2; 1.5.4; 1.5.8; 1.5.16; 1.5.18; 1.5.20;
sys/netinet6/ip6_ecn.h is reportedly a FreeBSD-ism; NetBSD has
prototypes for the IPv6 ECN ingress/egress functions in sys/netinet/ip_ecn.h,
inside an #ifdef INET6 wrapper. So, wrap sys/netipsec ocurrences of
#include <netinet6/ip6_ecn.h>
in #ifdef __FreeBSD__/#endif, until both camps can agree on this
teensy little piece of namespace. Affects:
ipsec_output.c xform_ah.c xform_esp.c xform_ipip.c
 1.4 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.3 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.2 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.5.20.1 28-Mar-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #1222):
sys/netipsec/xform_esp.c: revision 1.8
FreeBSD SA-06:11 and CVE-2006-0905: update the replay sequence number
or else the anti-reply technique won't work as expected.
 1.5.18.1 30-Mar-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #10384):
sys/netipsec/xform_esp.c: revision 1.8
FreeBSD SA-06:11 and CVE-2006-0905: update the replay sequence number
or else the anti-reply technique won't work as expected.
 1.5.16.1 28-Mar-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #1222):
sys/netipsec/xform_esp.c: revision 1.8
FreeBSD SA-06:11 and CVE-2006-0905: update the replay sequence number
or else the anti-reply technique won't work as expected.
 1.5.8.1 30-Mar-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #10384):
sys/netipsec/xform_esp.c: revision 1.8
FreeBSD SA-06:11 and CVE-2006-0905: update the replay sequence number
or else the anti-reply technique won't work as expected.
 1.5.4.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.5.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.5.4.2 03-Aug-2004  skrll Sync with HEAD
 1.5.4.1 17-Mar-2004  skrll file xform_esp.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.5.2.1 30-Mar-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #10384):
sys/netipsec/xform_esp.c: revision 1.8
FreeBSD SA-06:11 and CVE-2006-0905: update the replay sequence number
or else the anti-reply technique won't work as expected.
 1.6.2.4 04-Feb-2008  yamt sync with head.
 1.6.2.3 03-Sep-2007  yamt sync with head.
 1.6.2.2 30-Dec-2006  yamt sync with head.
 1.6.2.1 21-Jun-2006  yamt sync with head.
 1.7.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.7.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.7.10.2 11-May-2006  elad sync with head
 1.7.10.1 19-Apr-2006  elad sync with head.
 1.7.8.2 24-May-2006  yamt sync with head.
 1.7.8.1 01-Apr-2006  yamt sync with head.
 1.7.6.2 01-Jun-2006  kardel Sync with head.
 1.7.6.1 22-Apr-2006  simonb Sync with head.
 1.7.4.1 09-Sep-2006  rpaulo sync with head
 1.10.10.2 10-Dec-2006  yamt sync with head.
 1.10.10.1 22-Oct-2006  yamt sync with head
 1.10.8.1 18-Nov-2006  ad Sync with head.
 1.12.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.15.4.1 11-Jul-2007  mjf Sync with head.
 1.15.2.1 15-Jul-2007  ad Sync with head.
 1.16.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.16.8.1 23-Mar-2008  matt sync with HEAD
 1.17.8.1 18-May-2008  yamt sync with head.
 1.17.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.18.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.18.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.18.2.1 04-May-2009  yamt sync with head.
 1.22.8.2 05-Mar-2011  bouyer Sync with HEAD
 1.22.8.1 17-Feb-2011  bouyer Sync with HEAD
 1.22.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.22.4.3 31-May-2011  rmind sync with head
 1.22.4.2 21-Apr-2011  rmind sync with head
 1.22.4.1 05-Mar-2011  rmind sync with head
 1.39.6.1 18-Feb-2012  mrg merge to -current.
 1.39.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.39.2.2 30-Oct-2012  yamt sync with head
 1.39.2.1 17-Apr-2012  yamt sync with head
 1.40.8.1 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.40.6.1 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.40.2.1 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.41.2.3 03-Dec-2017  jdolecek update from HEAD
 1.41.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.41.2.1 23-Jun-2013  tls resync from head
 1.43.2.2 18-May-2014  rmind sync with head
 1.43.2.1 28-Aug-2013  rmind sync with head
 1.45.12.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.45.8.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.45.6.2 28-Aug-2017  skrll Sync with HEAD
 1.45.6.1 06-Apr-2015  skrll Sync with HEAD
 1.45.4.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.46.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.46.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.54.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.55.2.4 08-Oct-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1612):

sys/netipsec/xform_esp.c: revision 1.101

Make sequence number of esp header MP-safe for IPsec Tx side. reviewed by ozaki-r@n.o

In IPsec Tx side, one Security Association can be used by multiple CPUs.
On the other hand, in IPsec Rx side, one Security Association is used
by only one CPU.

XXX pullup-{8,9}
 1.55.2.3 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #670):

sys/netipsec/xform_esp.c: revision 1.73

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
 1.55.2.2 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #587):
sys/netipsec/xform_ipcomp.c: revision 1.54-1.56
sys/netipsec/xform_ah.c: revision 1.78,1.79(patch),1.82-1.84
sys/netipsec/xform_esp.c: revision 1.74-1.76

Fix mbuf leaks on error paths

Dedup common codes in error paths (NFCI)

Don't relook up an SP/SA in opencrpyto callbacks
We don't need to do so because we have a reference to it. And also
relooking-up one there may return an sp/sav that has different
parameters from an original one.

Fix kernel panic (assertion failure) on receiving an IPv6 packet with large options
If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.

Style.

Commonalize error paths (NFC)

Fix buffer overflow on sending an IPv6 packet with large options
If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.
Pointed out by maxv@
 1.55.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.79.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.79.2.3 21-May-2018  pgoyette Sync with HEAD
 1.79.2.2 02-May-2018  pgoyette Synch with HEAD
 1.79.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.96.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.96.2.1 10-Jun-2019  christos Sync with HEAD
 1.98.2.1 08-Oct-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1103):

sys/netipsec/xform_esp.c: revision 1.101

Make sequence number of esp header MP-safe for IPsec Tx side. reviewed by ozaki-r@n.o

In IPsec Tx side, one Security Association can be used by multiple CPUs.
On the other hand, in IPsec Rx side, one Security Association is used
by only one CPU.

XXX pullup-{8,9}
 1.106.10.1 02-Aug-2025  perseant Sync with HEAD
 1.76 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.75 19-Oct-2022  christos branches: 1.75.8;
PR/56836: Andrew Cagney: IPv6 ESN tunneling IPcomp has corrupt header

Always always send / expect CPI in IPcomp header

Fixes kern/56836 where an IPsec interop combining compression and
ESP|AH would fail.

Since fast ipsec, the outgoing IPcomp header has contained the
compression algorithm instead of the CPI. Adding the
SADB_X_EXT_RAWCPI flag worked around this but ...

The IPcomp's SADB was unconditionally hashed using the compression
algorithm instead of the CPI. This meant that an incoming packet with
a valid CPI could never match its SADB.
 1.74 22-May-2022  riastradh opencrypto: crypto_dispatch never fails now. Make it return void.

Same with crypto_kdispatch.
 1.73 22-May-2022  riastradh opencrypto: Rip out EAGAIN logic when unregistering crypto drivers.

I'm pretty sure this never worked reliably based on code inspection,
and it's unlikely to have ever been tested because it only applies
when unregistering a driver -- but we have no crypto drivers for
removable devices, so it would only apply if we went out of our way
to trigger detach with drvctl.

Instead, just make the operation fail with ENODEV, and remove all the
callback logic to resubmit the request on EAGAIN. (Maybe this should
be ENXIO, but crypto_kdispatch already does ENODEV.)
 1.72 22-May-2022  riastradh opencrypto: Make crypto_freesession return void.

No callers use the return value. It is not sensible to allow this to
fail.
 1.71 22-May-2022  riastradh netipsec: Nothing uses xf_zeroize return value. Nix it.
 1.70 22-May-2022  riastradh opencrypto: Make crp_callback, krp_callback return void.

Nothing uses the return values inside opencrypto, so let's stop
making users return them.
 1.69 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.68 12-Jun-2019  christos make DPRINTF use varyadic cpp macros, and merge with IPSECLOG.
 1.67 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.66 13-May-2018  maxv branches: 1.66.2;
Remove unused calls to nat_t_ports_get.
 1.65 07-May-2018  maxv Remove unused 'mp' argument from all the xf_output functions. Also clean
up xform.h a bit.
 1.64 01-May-2018  maxv Remove double include, opencrypto/xform.h is already included in
netipsec/xform.h.
 1.63 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.62 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.61 19-Apr-2018  maxv Add a KASSERT (which is not triggerable since ipsec_common_input already
ensures 8 bytes are present), add an XXX (about the fact that it is
better to use m_copydata, because it is faster and less error-prone), and
improve two m_copybacks (remove useless casts).
 1.60 10-Mar-2018  maxv Fix the computation. Normally that's harmless since ip6_output recomputes
ip6_plen.
 1.59 16-Feb-2018  maxv branches: 1.59.2;
Add [ah/esp/ipcomp]_enable sysctls, and remove the FreeBSD #ifdefs.
Discussed with ozaki-r@.
 1.58 16-Feb-2018  maxv Remove some more FreeBSD sysctl declarations that already have NetBSD
counterparts. Discussed with ozaki-r@.
 1.57 15-Feb-2018  maxv Style and simplify.
 1.56 15-Feb-2018  ozaki-r Don't relook up an SP/SA in opencrpyto callbacks

We don't need to do so because we have a reference to it. And also
relooking-up one there may return an sp/sav that has different
parameters from an original one.
 1.55 14-Feb-2018  ozaki-r Dedup common codes in error paths (NFCI)
 1.54 14-Feb-2018  ozaki-r Fix mbuf leaks on error paths

Pointed out by maxv@
 1.53 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.52 10-Aug-2017  ozaki-r Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
 1.51 09-Aug-2017  ozaki-r MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
 1.50 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.49 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.48 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.47 20-Jul-2017  ozaki-r Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
 1.46 19-Jul-2017  ozaki-r Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
 1.45 19-Jul-2017  ozaki-r Don't bother the case of crp->crp_buf == NULL in callbacks
 1.44 19-Jul-2017  ozaki-r Don't release sav if calling crypto_dispatch again
 1.43 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.42 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.41 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.40 05-Jul-2017  ozaki-r Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
 1.39 29-Jun-2017  ozaki-r Apply C99-style struct initialization to xformsw
 1.38 11-May-2017  ryo branches: 1.38.2;
Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.37 19-Apr-2017  ozaki-r branches: 1.37.2;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.36 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.35 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.34 15-Apr-2017  christos cosmetic fixes:
- __func__ in printfs
- no space after sizeof
- eliminate useless casts
- u_intX_t -> uintX_t
 1.33 13-Apr-2017  christos Redo the statistics through an indirection array and put the definitions
of the arrays in pfkeyv2.h so that they are next to the index definitions.
Remove "bogus" comment about compressing the statistics which is now fixed.
 1.32 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.31 03-Nov-2013  mrg branches: 1.31.6; 1.31.10; 1.31.14;
- apply some __diagused
- remove unused variables
- move some variables inside their relevant use #ifdef
 1.30 04-Jun-2013  christos branches: 1.30.2;
PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.29 25-Jan-2012  drochner branches: 1.29.6;
Make sure the mbufs in the input path (only the parts which we are going
to modify in the AH case) are writable/non-shared.
This addresses PR kern/33162 by Jeff Rizzo, and replaces the insufficient
patch from that time by a radical solution.
(The PR's problem had been worked around by rev.1.3 of xennetback_xenbus.c,
so it needs a network driver modification to reproduce it.)
Being here, clarify a bit of ipcomp -- uncompression is done in-place,
the header must be removed explicitly.
 1.28 06-May-2011  drochner branches: 1.28.4; 1.28.8;
As a first step towards more fine-grained locking, don't require
crypto_{new.free}session() to be called with the "crypto_mtx"
spinlock held.
This doesn't change much for now because these functions acquire
the said mutex first on entry now, but at least it keeps the nasty
locks local to the opencrypto core.
 1.27 05-May-2011  drochner fix C&P botch in diagnostic printfs
 1.26 01-Apr-2011  spz mitigation for CVE-2011-1547
 1.25 24-Feb-2011  drochner small modifications in dealing with the unknown result size of compression/
decompression:
-seperate the IPCOMP specific rule that compression must not grow the
data from general compression semantics: Introduce a special name
CRYPTO_DEFLATE_COMP_NOGROW/comp_algo_deflate_nogrow to describe
the IPCOMP semantics and use it there. (being here, fix the check
so that equal size is considered failure as well as required by
RFC2393)
Customers of CRYPTO_DEFLATE_COMP/comp_algo_deflate now always get
deflated data back, even if they are not smaller than the original.
-allow to pass a "size hint" to the DEFLATE decompression function
which is used for the initial buffer allocation. Due to the changes
done there, additional allocations and extra copies are avoided if the
initial allocation is sufficient. Set the size hint to MCLBYTES (=2k)
in IPCOMP which should be good for many use cases.
 1.24 18-Feb-2011  drochner more "const"
 1.23 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.22 14-Feb-2011  drochner change locking order, to make sure the cpu is at splsoftnet()
before the softnet_lock (adaptive) mutex is acquired, from
Wolfgang Stukenbrock, should fix a recursive lock panic
 1.21 10-Feb-2011  drochner -in opencrypto callbacks (which run in a kernel thread), pull softnet_lock
everywhere splsoftnet() was used before, to fix MP concurrency problems
-pull KERNEL_LOCK where ip(6)_output() is called, as this is what
the network stack (unfortunately) expects, in particular to avoid
races for packets in the interface send queues
From Wolfgang Stukenbrock per PR kern/44418, with the application
of KERNEL_LOCK to what I think are the essential points, tested
on a dual-core i386.
 1.20 21-Sep-2010  degroote branches: 1.20.2; 1.20.4;
Fix ipcomp input counter

Reported Wolfgang Stukenbrock in pr/43250.
 1.19 18-Mar-2009  cegger branches: 1.19.2; 1.19.4;
bzero -> memset
 1.18 23-Apr-2008  thorpej branches: 1.18.2; 1.18.10; 1.18.12; 1.18.16; 1.18.18; 1.18.22;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.17 04-Feb-2008  tls branches: 1.17.6; 1.17.8;
Rework opencrypto to use a spin mutex (crypto_mtx) instead of "splcrypto"
(actually splnet) and condvars instead of tsleep/wakeup. Fix a few
miscellaneous problems and add some debugging printfs while there.

Restore set of CRYPTO_F_DONE in crypto_done() which was lost at some
point after this code came from FreeBSD -- it made it impossible to wait
properly for a condition.

Add flags analogous to the "crp" flags to the key operation's krp struct.
Add a new flag, CRYPTO_F_ONRETQ which tells us a request finished before
the kthread had a chance to dequeue it and call its callback -- this was
letting requests stick on the queues before even though done and copied
out.

Callers of crypto_newsession() or crypto_freesession() must now take the
mutex. Change netipsec to do so. Dispatch takes the mutex itself as
needed.

This was tested fairly extensively with the cryptosoft backend and lightly
with a new hardware driver. It has not been tested with FAST_IPSEC; I am
unable to ascertain whether FAST_IPSEC currently works at all in our tree.

pjd@FreeBSD.ORG, ad@NetBSD.ORG, and darran@snark.us pointed me in the
right direction several times in the course of this. Remaining bugs
are mine alone.
 1.16 29-Dec-2007  degroote Add some statistics for case where compression is not useful
(when len(compressed packet) > len(initial packet))
 1.15 22-Sep-2007  degroote branches: 1.15.6; 1.15.12;
Fix my previous stupid caddr_t fix.
 1.14 27-Jun-2007  degroote branches: 1.14.6; 1.14.8;
Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.13 04-Mar-2007  degroote branches: 1.13.2; 1.13.4;
Remove useless cast
Use NULL instead of (void*) 0
 1.12 04-Mar-2007  degroote Fix fallout from caddr_t changes
 1.11 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.10 23-Feb-2007  degroote Oops, I forgot to commit some bits last time

fast_ipsec and ipcomp works better now.
 1.9 10-Feb-2007  degroote branches: 1.9.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.8 16-Nov-2006  christos branches: 1.8.2; 1.8.4;
__unused removal on arguments; approved by core.
 1.7 13-Oct-2006  christos more __unused
 1.6 11-Dec-2005  christos branches: 1.6.20; 1.6.22;
merge ktrace-lwp.
 1.5 26-Feb-2005  perry branches: 1.5.4;
nuke trailing whitespace
 1.4 06-Oct-2003  tls branches: 1.4.4; 1.4.10; 1.4.12;
Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.3 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.2 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.4.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.4.10.1 29-Apr-2005  kent sync with -current
 1.4.4.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.4.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.4.4.2 03-Aug-2004  skrll Sync with HEAD
 1.4.4.1 06-Oct-2003  skrll file xform_ipcomp.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.5.4.6 04-Feb-2008  yamt sync with head.
 1.5.4.5 21-Jan-2008  yamt sync with head
 1.5.4.4 27-Oct-2007  yamt sync with head.
 1.5.4.3 03-Sep-2007  yamt sync with head.
 1.5.4.2 26-Feb-2007  yamt sync with head.
 1.5.4.1 30-Dec-2006  yamt sync with head.
 1.6.22.2 10-Dec-2006  yamt sync with head.
 1.6.22.1 22-Oct-2006  yamt sync with head
 1.6.20.1 18-Nov-2006  ad Sync with head.
 1.8.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.8.2.2 03-Apr-2011  riz Pull up following revision(s) (requested by spz in ticket #1425):
sys/netipsec/xform_ipcomp.c: revision 1.26
sys/netinet6/ipcomp_input.c: revision 1.37
mitigation for CVE-2011-1547
this should really be solved by counting nested headers (like in the
inet6 case) instead
mitigation for CVE-2011-1547
 1.8.2.1 24-May-2007  pavel branches: 1.8.2.1.4;
Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.8.2.1.4.1 03-Apr-2011  riz Pull up following revision(s) (requested by spz in ticket #1425):
sys/netipsec/xform_ipcomp.c: revision 1.26
sys/netinet6/ipcomp_input.c: revision 1.37
mitigation for CVE-2011-1547
this should really be solved by counting nested headers (like in the
inet6 case) instead
mitigation for CVE-2011-1547
 1.9.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.9.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.13.4.1 11-Jul-2007  mjf Sync with head.
 1.13.2.2 09-Oct-2007  ad Sync with head.
 1.13.2.1 15-Jul-2007  ad Sync with head.
 1.14.8.3 23-Mar-2008  matt sync with HEAD
 1.14.8.2 09-Jan-2008  matt sync with HEAD
 1.14.8.1 06-Nov-2007  matt sync with HEAD
 1.14.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.15.12.1 02-Jan-2008  bouyer Sync with HEAD
 1.15.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.17.8.1 18-May-2008  yamt sync with head.
 1.17.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.18.22.1 03-Apr-2011  jdc Pull up:
src/sys/netinet6/ipcomp_input.c revision 1.37
src/sys/netipsec/xform_ipcomp.c revision 1.26

(requested by spz in ticket #1590).

mitigation for CVE-2011-1547
 1.18.18.1 03-Apr-2011  jdc Pull up:
src/sys/netinet6/ipcomp_input.c revision 1.37
src/sys/netipsec/xform_ipcomp.c revision 1.26

(requested by spz in ticket #1590).

mitigation for CVE-2011-1547
 1.18.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.18.12.1 03-Apr-2011  jdc Pull up:
src/sys/netinet6/ipcomp_input.c revision 1.37
src/sys/netipsec/xform_ipcomp.c revision 1.26

(requested by spz in ticket #1590).

mitigation for CVE-2011-1547
 1.18.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.18.2.2 09-Oct-2010  yamt sync with head
 1.18.2.1 04-May-2009  yamt sync with head.
 1.19.4.3 31-May-2011  rmind sync with head
 1.19.4.2 21-Apr-2011  rmind sync with head
 1.19.4.1 05-Mar-2011  rmind sync with head
 1.19.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.20.4.2 05-Mar-2011  bouyer Sync with HEAD
 1.20.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.20.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.28.8.1 18-Feb-2012  mrg merge to -current.
 1.28.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.28.4.1 17-Apr-2012  yamt sync with head
 1.29.6.3 03-Dec-2017  jdolecek update from HEAD
 1.29.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.29.6.1 23-Jun-2013  tls resync from head
 1.30.2.1 18-May-2014  rmind sync with head
 1.31.14.1 21-Apr-2017  bouyer Sync with HEAD
 1.31.10.1 26-Apr-2017  pgoyette Sync with HEAD
 1.31.6.1 28-Aug-2017  skrll Sync with HEAD
 1.37.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.38.2.2 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #587):
sys/netipsec/xform_ipcomp.c: revision 1.54-1.56
sys/netipsec/xform_ah.c: revision 1.78,1.79(patch),1.82-1.84
sys/netipsec/xform_esp.c: revision 1.74-1.76

Fix mbuf leaks on error paths

Dedup common codes in error paths (NFCI)

Don't relook up an SP/SA in opencrpyto callbacks
We don't need to do so because we have a reference to it. And also
relooking-up one there may return an sp/sav that has different
parameters from an original one.

Fix kernel panic (assertion failure) on receiving an IPv6 packet with large options
If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.

Style.

Commonalize error paths (NFC)

Fix buffer overflow on sending an IPv6 packet with large options
If an IPv6 packet has large options, a necessary space for evacuation can
exceed the expected size (ah_pool_item_size). Give up using the pool_cache
if it happens.
Pointed out by maxv@
 1.38.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.59.2.4 21-May-2018  pgoyette Sync with HEAD
 1.59.2.3 02-May-2018  pgoyette Synch with HEAD
 1.59.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.59.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.66.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.66.2.1 10-Jun-2019  christos Sync with HEAD
 1.75.8.1 02-Aug-2025  perseant Sync with HEAD
 1.80 11-Jun-2025  ozaki-r in: get rid of unused argument from ip_newid() and ip_newid_range()
 1.79 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.78 22-May-2022  riastradh branches: 1.78.4; 1.78.10;
netipsec: Nothing uses xf_zeroize return value. Nix it.
 1.77 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.76 12-Jun-2019  christos make DPRINTF use varyadic cpp macros, and merge with IPSECLOG.
 1.75 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.74 07-May-2018  maxv branches: 1.74.2;
Remove a dummy reference to XF_IP4, explain briefly why we don't use
ipe4_xformsw, and remove unused includes.
 1.73 07-May-2018  maxv Remove now unused 'isr', 'skip' and 'protoff' arguments from ipip_output.
 1.72 07-May-2018  maxv Remove unused 'mp' argument from all the xf_output functions. Also clean
up xform.h a bit.
 1.71 07-May-2018  maxv Clarify IPIP: ipe4_xformsw is not allowed to call ipip_output, so replace
the pointer by ipe4_output, which just panics. Group the ipe4_* functions
together. Localify other functions.

ok ozaki-r@
 1.70 29-Apr-2018  maxv Remove obsolete/dead code, the IP-in-IP encapsulation doesn't work this
way anymore (XF_IP4 partly dropped by FAST_IPSEC).
 1.69 28-Apr-2018  maxv Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op.
 1.68 24-Apr-2018  maxv Remove the M_AUTHIPDGM flag. It is equivalent to M_AUTHIPHDR, both
are set in IPsec-AH, and they are always handled together.
 1.67 22-Apr-2018  maxv Rename ipip_allow->ipip_spoofcheck, and add net.inet.ipsec.ipip_spoofcheck.
Makes it simpler, and also fixes PR/39919.
 1.66 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.65 19-Apr-2018  maxv Remove unused typedef, remove unused arguments from _ipip_input, sync
comment with reality, and change panic message.
 1.64 18-Apr-2018  maxv style
 1.63 15-Feb-2018  maxv branches: 1.63.2;
Remove broken MROUTING code, rename ipo->ip4, and simplify.
 1.62 15-Feb-2018  maxv Fix the IPIP_STAT_IBYTES stats; we did m_adj(m, iphlen) which substracted
iphlen, so no need to re-substract it again.
 1.61 15-Feb-2018  maxv dedup again
 1.60 15-Feb-2018  maxv dedup
 1.59 15-Feb-2018  maxv Style and remove dead code.
 1.58 24-Jan-2018  maxv style
 1.57 24-Jan-2018  maxv As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.56 14-Jan-2018  maxv Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
 1.55 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.54 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.53 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.52 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.51 12-Jul-2017  ozaki-r Omit unnecessary NULL checks for sav->sah
 1.50 29-Jun-2017  ozaki-r Apply C99-style struct initialization to xformsw
 1.49 11-May-2017  ryo branches: 1.49.2;
Make ipsec_address() and ipsec_logsastr() mpsafe.
 1.48 19-Apr-2017  ozaki-r branches: 1.48.2;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.47 18-Apr-2017  ozaki-r Convert IPSEC_ASSERT to KASSERT or KASSERTMSG

IPSEC_ASSERT just discarded specified message...
 1.46 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.45 15-Apr-2017  christos cosmetic fixes:
- __func__ in printfs
- no space after sizeof
- eliminate useless casts
- u_intX_t -> uintX_t
 1.44 14-Apr-2017  christos PR/52161: Ryota Ozaki: Fix AH tunnel ipsec for ipv6. Compute plen right,
don't forget to subtract the ipv6 header length.
 1.43 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.42 07-Jul-2016  ozaki-r branches: 1.42.2; 1.42.4;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.41 04-Jul-2016  knakahara make encap_lock_{enter,exit} interruptable.
 1.40 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (2/2) : ip_encap side

The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
 1.39 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.38 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.37 26-Jan-2016  knakahara eliminate variable argument in encapsw
 1.36 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.35 22-Jan-2016  riastradh Back out previous change to introduce struct encapsw.

This change was intended, but Nakahara-san had already made a better
one locally! So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
 1.34 22-Jan-2016  riastradh Don't abuse struct protosw for ip_encap -- introduce struct encapsw.

Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
 1.33 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.32 27-Mar-2015  ozaki-r Remove unnecessary ifdef IPSEC
 1.31 05-Jun-2014  rmind branches: 1.31.2; 1.31.4; 1.31.6; 1.31.10;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.30 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.29 05-Jun-2013  christos branches: 1.29.2; 1.29.6;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.28 17-Jul-2011  joerg branches: 1.28.2; 1.28.8; 1.28.12; 1.28.14; 1.28.22;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.27 06-Jun-2011  drochner remove a limitation that inner and outer IP version must be equal
for an ESP tunnel, and add some fixes which make v4-in-v6 work
(v6 as inner protocol isn't ready, even v6-in-v6 can never have worked)

being here, fix a statistics counter and kill an unused variable
 1.26 18-Feb-2011  drochner branches: 1.26.2;
more "const"
 1.25 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.24 27-Apr-2008  degroote branches: 1.24.22; 1.24.28; 1.24.30;
Fix some fallout from socket locking patch :

- {ah6,esp6}_ctlinput must return void*
- use correct wrapper for rip_usrreq
 1.23 24-Apr-2008  ad branches: 1.23.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.22 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.21 10-Feb-2008  degroote branches: 1.21.6; 1.21.8;
Fix build of FAST_IPSEC after the change of ip_newid prototype
 1.20 07-Dec-2007  elad Use struct initializers. No functional change.
 1.19 04-Dec-2007  dyoung Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.18 28-Oct-2007  adrianp branches: 1.18.2; 1.18.4;
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.

Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.17 04-Mar-2007  degroote branches: 1.17.6; 1.17.14; 1.17.16; 1.17.20;
Remove useless cast
Use NULL instead of (void*) 0
 1.16 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.15 10-Feb-2007  degroote branches: 1.15.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.14 16-Nov-2006  christos branches: 1.14.2; 1.14.4;
__unused removal on arguments; approved by core.
 1.13 13-Oct-2006  christos more __unused
 1.12 11-Dec-2005  christos branches: 1.12.20; 1.12.22;
merge ktrace-lwp.
 1.11 06-Jun-2005  martin branches: 1.11.2;
Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.10 26-Feb-2005  perry branches: 1.10.2; 1.10.4; 1.10.6;
nuke trailing whitespace
 1.9 17-Mar-2004  jonathan branches: 1.9.2; 1.9.4; 1.9.8; 1.9.10; 1.9.12; 1.9.16;
sys/netinet6/ip6_ecn.h is reportedly a FreeBSD-ism; NetBSD has
prototypes for the IPv6 ECN ingress/egress functions in sys/netinet/ip_ecn.h,
inside an #ifdef INET6 wrapper. So, wrap sys/netipsec ocurrences of
#include <netinet6/ip6_ecn.h>
in #ifdef __FreeBSD__/#endif, until both camps can agree on this
teensy little piece of namespace. Affects:
ipsec_output.c xform_ah.c xform_esp.c xform_ipip.c
 1.8 16-Jan-2004  scw Fix ipip_output() to always set *mp to NULL on failure, even if 'm'
is NULL, otherwise ipsec4_process_packet() may try to m_freem() a
bad pointer.

In ipsec4_process_packet(), don't try to m_freem() 'm' twice; ipip_output()
already did it.
 1.7 17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.6 14-Nov-2003  jonathan Use ip_randomid(), dependent on either __NetBSD__ preprocessor
token or FreeBSD RANDOM_IP_ID config option.
 1.5 06-Oct-2003  tls Reversion of "netkey merge", part 2 (replacement of removed files in the
repository by christos was part 1). netipsec should now be back as it
was on 2003-09-11, with some very minor changes:

1) Some residual platform-dependent code was moved from ipsec.h to
ipsec_osdep.h; without this, IPSEC_ASSERT() was multiply defined. ipsec.h
now includes ipsec_osdep.h

2) itojun's renaming of netipsec/files.ipsec to netipsec/files.netipsec has
been left in place (it's arguable which name is less confusing but the
rename is pretty harmless).

3) Some #endif TOKEN has been replaced by #endif /* TOKEN */; #endif TOKEN
is invalid and GCC 3 won't compile it.

An i386 kernel with "options FAST_IPSEC" and "options OPENCRYPTO" now
gets through "make depend" but fails to build with errors in ip_input.c.
But it's better than it was (thank heaven for small favors).
 1.4 12-Sep-2003  itojun merge netipsec/key* into netkey/key*. no need for both.
change confusing filename
 1.3 12-Sep-2003  itojun use ip_randomid
 1.2 20-Aug-2003  jonathan opt_inet6.h is FreeBSD-specific, so wrap it with #ifdef __FreeBSD__/#endif.
 1.1 13-Aug-2003  jonathan Initial import of Sam Leffler's `Fast-IPsec' from FreeBSD 4.
Fast-IPsec is a rework of the OpenBSD and KAME IPsec code, using the
OpenCryptoFramework (and thus hardware crypto accelerators) and
numerous detailed performance improvements.

This import is (aside from SPL-level names) the FreeBSD source,
imported ``as-is'' as a historical snapshot, for future maintenance
and comparison against the FreeBSD source. For now, several minor
kernel-API differences are hidden by macros a shim file, ipsec_osdep.h,
which (aside from SPL names) can be targeted at either NetBSD or FreeBSD.
 1.9.16.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.9.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.9.10.1 29-Apr-2005  kent sync with -current
 1.9.8.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.9.4.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.9.4.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.9.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.9.4.2 03-Aug-2004  skrll Sync with HEAD
 1.9.4.1 17-Mar-2004  skrll file xform_ipip.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.9.2.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11395):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.10.6.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.10.4.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.10.2.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #1878):
sys/netipsec/xform_ah.c: revision 1.19 via patch
sys/netipsec/ipsec.c: revision 1.34 via patch
sys/netipsec/xform_ipip.c: revision 1.18 via patch
sys/netipsec/ipsec_output.c: revision 1.23 via patch
sys/netipsec/ipsec_osdep.h: revision 1.21 via patch
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote@
 1.11.2.7 11-Feb-2008  yamt sync with head.
 1.11.2.6 21-Jan-2008  yamt sync with head
 1.11.2.5 07-Dec-2007  yamt sync with head
 1.11.2.4 15-Nov-2007  yamt sync with head.
 1.11.2.3 03-Sep-2007  yamt sync with head.
 1.11.2.2 26-Feb-2007  yamt sync with head.
 1.11.2.1 30-Dec-2006  yamt sync with head.
 1.12.22.2 10-Dec-2006  yamt sync with head.
 1.12.22.1 22-Oct-2006  yamt sync with head
 1.12.20.1 18-Nov-2006  ad Sync with head.
 1.14.4.2 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.14.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.14.2.2 31-Oct-2007  liamjfoy Pull up following revision(s) (requested by adrianp in ticket #964):
sys/netipsec/xform_ah.c: revision 1.19
sys/netipsec/ipsec.c: revision 1.34
sys/netipsec/xform_ipip.c: revision 1.18
sys/netipsec/ipsec_output.c: revision 1.23
sys/netipsec/ipsec_osdep.h: revision 1.21
The function ipsec4_get_ulp assumes that ip_off is in host order. This results
in IPsec processing that is dependent on protocol and/or port can be bypassed.
Bug report, analysis and initial fix from Karl Knutsson.
Final patch and ok from degroote&#64;
 1.14.2.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.15.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.17.20.1 13-Nov-2007  bouyer Sync with HEAD
 1.17.16.3 23-Mar-2008  matt sync with HEAD
 1.17.16.2 09-Jan-2008  matt sync with HEAD
 1.17.16.1 06-Nov-2007  matt sync with HEAD
 1.17.14.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.17.14.1 28-Oct-2007  joerg Sync with HEAD.
 1.17.6.1 09-Dec-2007  reinoud Pullup to HEAD
 1.18.4.1 08-Dec-2007  ad Sync with head.
 1.18.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.18.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.21.8.1 18-May-2008  yamt sync with head.
 1.21.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.23.2.1 16-May-2008  yamt sync with head.
 1.24.30.1 05-Mar-2011  bouyer Sync with HEAD
 1.24.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.24.22.2 12-Jun-2011  rmind sync with head
 1.24.22.1 05-Mar-2011  rmind sync with head
 1.26.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.28.22.2 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.28.22.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1529):
sys/netipsec/xform_ipip.c: revision 1.44 via patch

PR/52161: Ryota Ozaki: Fix AH tunnel ipsec for ipv6. Compute plen right,
don't forget to subtract the ipv6 header length.
 1.28.14.2 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.28.14.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1529):
sys/netipsec/xform_ipip.c: revision 1.44 via patch

PR/52161: Ryota Ozaki: Fix AH tunnel ipsec for ipv6. Compute plen right,
don't forget to subtract the ipv6 header length.
 1.28.12.3 03-Dec-2017  jdolecek update from HEAD
 1.28.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.28.12.1 23-Jun-2013  tls resync from head
 1.28.8.2 13-Mar-2018  snj Pull up following revision(s) (requested by maxv in ticket #1532):
sys/netipsec/xform_ah.c: 1.77 via patch
sys/netipsec/xform_esp.c: 1.73 via patch
sys/netipsec/xform_ipip.c: 1.56-1.57 via patch
Reinforce and clarify.
--
Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.
--
Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:
218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr
Found by Mootja.
Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
--
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.
 1.28.8.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1529):
sys/netipsec/xform_ipip.c: revision 1.44 via patch

PR/52161: Ryota Ozaki: Fix AH tunnel ipsec for ipv6. Compute plen right,
don't forget to subtract the ipv6 header length.
 1.28.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.6.1 10-Aug-2014  tls Rebase.
 1.29.2.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.31.10.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.31.10.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1567):
sys/netipsec/xform_ipip.c: revision 1.44
PR/52161: Ryota Ozaki: Fix AH tunnel ipsec for ipv6. Compute plen right,
don't forget to subtract the ipv6 header length.
 1.31.6.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.31.6.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1567):
sys/netipsec/xform_ipip.c: revision 1.44
PR/52161: Ryota Ozaki: Fix AH tunnel ipsec for ipv6. Compute plen right,
don't forget to subtract the ipv6 header length.
 1.31.4.5 28-Aug-2017  skrll Sync with HEAD
 1.31.4.4 09-Jul-2016  skrll Sync with HEAD
 1.31.4.3 29-May-2016  skrll Sync with HEAD
 1.31.4.2 19-Mar-2016  skrll Sync with HEAD
 1.31.4.1 06-Apr-2015  skrll Sync with HEAD
 1.31.2.2 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1569):
sys/netipsec/xform_ah.c: revision 1.77, 1.81 (via patch)
sys/netipsec/xform_esp.c: revision 1.73 (via patch)
sys/netipsec/xform_ipip.c: revision 1.56, 1.57 (via patch)

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.

Reinforce and clarify.

Add missing NULL check. Normally that's not triggerable remotely, since we
are guaranteed that 8 bytes are valid at mbuf+skip.

As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Make sure the Authentication Header fits the mbuf chain, otherwise panic.
 1.31.2.1 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1567):
sys/netipsec/xform_ipip.c: revision 1.44
PR/52161: Ryota Ozaki: Fix AH tunnel ipsec for ipv6. Compute plen right,
don't forget to subtract the ipv6 header length.
 1.42.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.42.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.48.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.49.2.3 15-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #551):
sys/netipsec/xform_ipip.c: revision 1.56-1.63

Fix use-after-free. There is a path where the mbuf gets pulled up without
a proper mtod afterwards:

218 ipo = mtod(m, struct ip *);
281 m = m_pullup(m, hlen);
232 ipo->ip_src.s_addr

Found by Mootja.

Meanwhile it seems to me that 'ipo' should be set to NULL if the inner
packet is IPv6, but I'll revisit that later.
As I said in my last commit in this file, ipo should be set to NULL;
otherwise the 'local address spoofing' check below is always wrong on
IPv6.

Style and remove dead code.

dedup

Fix the IPIP_STAT_IBYTES stats; we did m_adj(m, iphlen) which substracted
iphlen, so no need to re-substract it again.

Remove broken MROUTING code, rename ipo->ip4, and simplify.
 1.49.2.2 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.49.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.63.2.3 21-May-2018  pgoyette Sync with HEAD
 1.63.2.2 02-May-2018  pgoyette Synch with HEAD
 1.63.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.74.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.74.2.1 10-Jun-2019  christos Sync with HEAD
 1.78.10.1 02-Aug-2025  perseant Sync with HEAD
 1.78.4.1 29-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1140):

sys/netinet/ip_output.c: revision 1.330
sys/netinet/sctp_output.c: revision 1.39
sys/netinet/ip_mroute.c: revision 1.166
sys/netipsec/ipsecif.c: revision 1.24
sys/netipsec/xform_ipip.c: revision 1.80
sys/netinet/ip_output.c: revision 1.327
sys/netinet/ip_output.c: revision 1.328
sys/netinet/ip_input.c: revision 1.406
sys/netinet/ip_output.c: revision 1.329
sys/netinet/in_var.h: revision 1.105

in: get rid of unused argument from ip_newid() and ip_newid_range()

in: take a reference of ifp on IP_ROUTETOIF
The ifp could be released after ia4_release(ia).

in: narrow the scope of ifa in ip_output (NFC)

sctp: follow the recent change of ip_newid()

in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()
If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.
PR kern/59527

in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()
Same as the case of ip_output(), it's racy and should be avoided.
PR kern/59527
 1.25 22-May-2022  riastradh netipsec: Nothing uses xf_zeroize return value. Nix it.
 1.24 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.23 12-Jun-2019  christos make DPRINTF use varyadic cpp macros, and merge with IPSECLOG.
 1.22 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.21 14-May-2018  ozaki-r branches: 1.21.2;
Restore TCP header inclusions for TCP_SIGNATURE
 1.20 11-May-2018  maxv Clean up, and panic if we call functions that are not supposed to be
called.
 1.19 07-May-2018  maxv Remove unused 'mp' argument from all the xf_output functions. Also clean
up xform.h a bit.
 1.18 19-Apr-2018  maxv Remove extra long file paths from the headers.
 1.17 26-Feb-2018  maxv branches: 1.17.2;
Add XXX, it seems to me we need to free the mbuf here.
 1.16 03-Oct-2017  ozaki-r Constify isr at many places (NFC)
 1.15 14-Jul-2017  ozaki-r Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
 1.14 14-Jul-2017  ozaki-r Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
 1.13 10-Jul-2017  ozaki-r Use explicit_memset to surely zero-clear key_auth and key_enc
 1.12 29-Jun-2017  ozaki-r Apply C99-style struct initialization to xformsw
 1.11 19-Apr-2017  ozaki-r branches: 1.11.4;
Retire ipsec_osdep.h

We don't need to care other OSes (FreeBSD) anymore.

Some macros are alive in ipsec_private.h.
 1.10 18-Apr-2017  ozaki-r Remove __FreeBSD__ and __NetBSD__ switches

No functional changes (except for a debug printf).

Note that there remain some __FreeBSD__ for sysctl knobs which counerparts
to NetBSD don't exist. And ipsec_osdep.h isn't touched yet; tidying it up
requires actual code changes.
 1.9 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.8 11-Jan-2012  drochner branches: 1.8.6; 1.8.24; 1.8.28; 1.8.32;
fix build in the (FAST_)IPSEC & TCP_SIGNATURE case
 1.7 18-Feb-2011  drochner branches: 1.7.4; 1.7.8;
more "const"
 1.6 18-Feb-2011  drochner sprinkle some "const", documenting that the SA is not supposed to
change during an xform operation
 1.5 18-Mar-2009  cegger branches: 1.5.4; 1.5.6; 1.5.8;
bzero -> memset
 1.4 11-Dec-2007  lukem branches: 1.4.12; 1.4.20; 1.4.26;
use __KERNEL_RCSID()
 1.3 11-Dec-2005  christos branches: 1.3.46; 1.3.56; 1.3.58; 1.3.60;
merge ktrace-lwp.
 1.2 26-Feb-2005  perry branches: 1.2.4;
nuke trailing whitespace
 1.1 25-Apr-2004  jonathan branches: 1.1.2; 1.1.6; 1.1.8;
Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.1.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.6.1 29-Apr-2005  kent sync with -current
 1.1.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.2.1 25-Apr-2004  skrll file xform_tcp.c was added on branch ktrace-lwp on 2004-08-03 10:55:29 +0000
 1.2.4.1 21-Jan-2008  yamt sync with head
 1.3.60.1 13-Dec-2007  bouyer Sync with HEAD
 1.3.58.1 11-Dec-2007  yamt sync with head.
 1.3.56.1 26-Dec-2007  ad Sync with head.
 1.3.46.1 09-Jan-2008  matt sync with HEAD
 1.4.26.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.4.20.1 28-Apr-2009  skrll Sync with HEAD.
 1.4.12.1 04-May-2009  yamt sync with head.
 1.5.8.1 05-Mar-2011  bouyer Sync with HEAD
 1.5.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.4.1 05-Mar-2011  rmind sync with head
 1.7.8.1 18-Feb-2012  mrg merge to -current.
 1.7.4.1 17-Apr-2012  yamt sync with head
 1.8.32.1 21-Apr-2017  bouyer Sync with HEAD
 1.8.28.1 26-Apr-2017  pgoyette Sync with HEAD
 1.8.24.1 28-Aug-2017  skrll Sync with HEAD
 1.8.6.1 03-Dec-2017  jdolecek update from HEAD
 1.11.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.17.2.2 21-May-2018  pgoyette Sync with HEAD
 1.17.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.21.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.21.2.1 10-Jun-2019  christos Sync with HEAD

RSS XML Feed