Home | History | Annotate | only in /src/sys/netinet6
History log of /src/sys/netinet6
RevisionDateAuthorComments
 1.15 24-Jun-2001  itojun the documents are out of sync with the latest situation. remove them.
 1.14 12-Jun-2000  itojun branches: 1.14.4;
sync with latest kame tree (tiny update in IPv4 mapped issue)
 1.13 10-Jun-2000  itojun sync with latest kame document.
- update 6to4 i-d #.
- update descr on source address selection.
 1.12 28-May-2000  itojun sync with reality in netbsd-current.
- pcb layer changes
- officially supported net interfaces
- minor typo
- draft # updates
 1.11 22-Mar-2000  itojun branches: 1.11.2;
correct references. update ipsec description (sync with kame).
 1.10 28-Feb-2000  itojun support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.
 1.9 26-Feb-2000  itojun sync description on proxy NDP with latest KAME doc.
 1.8 25-Feb-2000  itojun sync with latest KAME document.
- updates in I-D/RFC #
- scoped address syntax change
- remove ALTQ and other portion to avoid confusion
 1.7 09-Feb-2000  itojun sync with extended scoped address syntax change.
 1.6 03-Feb-2000  itojun add notice on site-locals. typo fix. (sync with kame)
 1.5 01-Feb-2000  itojun sync with current code. now IMPLEMENTATION doc is almost identical
to the latest KAME one.
 1.4 06-Jan-2000  itojun update tcp/udp v4 mapped addr issues.
 1.3 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.2 03-Jul-1999  thorpej branches: 1.2.2; 1.2.8;
RCS ID police.
 1.1 01-Jul-1999  itojun branches: 1.1.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.1.2.1 01-Jul-1999  thorpej file IMPLEMENTATION was added on branch chs-ubc2 on 1999-07-01 23:48:25 +0000
 1.2.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.2.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.14.4.1 28-Feb-2002  nathanw Catch up to -current.
 1.10 06-Sep-2018  maxv Remove netinet6/ipsec.h.
 1.9 16-Feb-2017  knakahara branches: 1.9.12; 1.9.14;
add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.8 06-Jan-2012  drochner branches: 1.8.6; 1.8.24; 1.8.28; 1.8.32;
more IPSEC header cleanup: don't install unneeded headers to userland,
and remove some differences berween KAME and FAST_IPSEC
 1.7 04-Jan-2012  drochner -consistently use "char *" for the compiled policy buffer in the
ipsec_*_policy() functions, as it was documented and used by clients
-remove "ipsec_policy_t" which was undocumented and only present
in the KAME version of the ipsec.h header
-misc cleanup of historical artefacts, and to remove unnecessary
differences between KAME ans FAST_IPSEC
 1.6 26-Nov-2002  lukem branches: 1.6.36; 1.6.100; 1.6.144; 1.6.148;
Remove KDIR=, since SYS_INCLUDE=symlinks and KDIR are not supported any more.
 1.5 18-Oct-2001  itojun gather stats on raw ip6 socket. sync with kame
 1.4 04-Jun-2000  itojun branches: 1.4.4; 1.4.6;
remove include files in nonstandard path
(has been #error for couple of months).
 1.3 30-Jul-1999  itojun branches: 1.3.2; 1.3.10;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file Makefile was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file Makefile was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.3.10.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.4.4.2 11-Dec-2002  thorpej Sync with HEAD.
 1.4.4.1 22-Oct-2001  nathanw Catch up to -current.
 1.6.148.1 18-Feb-2012  mrg merge to -current.
 1.6.144.1 17-Apr-2012  yamt sync with head
 1.6.100.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.6.36.1 02-Feb-2006  rpaulo in6_pcb.h is gone.
 1.8.32.1 21-Apr-2017  bouyer Sync with HEAD
 1.8.28.1 20-Mar-2017  pgoyette Sync with HEAD
 1.8.24.1 28-Aug-2017  skrll Sync with HEAD
 1.8.6.1 03-Dec-2017  jdolecek update from HEAD
 1.9.14.1 10-Jun-2019  christos Sync with HEAD
 1.9.12.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.13 24-Jun-2001  itojun the documents are out of sync with the latest situation. remove them.
 1.12 05-Feb-2000  itojun branches: 1.12.6;
need PRC_IF{UP,CHANGE}.
 1.11 05-Feb-2000  itojun sync with reality.
- getipnodeby{name,addr} is now non-issue as RFC2553bis will be dropping it
- if_detach is mostly done
- add some items
 1.10 03-Feb-2000  itojun - if_detach
- xx_control calls from interrupt thread should be removed
- LP64
 1.9 01-Feb-2000  itojun sync with current code. now IMPLEMENTATION doc is almost identical
to the latest KAME one.
 1.8 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.7 05-Jan-2000  itojun better sync with reality.
 1.6 13-Dec-1999  itojun synchronize list of IPv6 TODOs with reality.
 1.5 14-Aug-1999  itojun branches: 1.5.2; 1.5.8;
typo fix (from koji@dti.ad.jp).
remove things that are already done.
 1.4 02-Jul-1999  itojun remove TIME_WAIT issue, it was false.
 1.3 02-Jul-1999  itojun add tcp6 port # oddity.
add splnet/splsoftnet issue.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
document issues in libc extensions.
 1.1 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file TODO was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.6.1 28-Feb-2002  nathanw Catch up to -current.
 1.27 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.26 14-Mar-2009  dsl branches: 1.26.12; 1.26.16;
Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.25 24-Apr-2008  ad branches: 1.25.2; 1.25.10; 1.25.16;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.24 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.23 17-Feb-2007  dyoung branches: 1.23.38; 1.23.40;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.22 10-Dec-2005  elad branches: 1.22.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.21 05-Aug-2003  itojun branches: 1.21.16;
increase AH_MAXSUMSIZE to 512/8, for hmac-sha2-512
 1.20 22-Jul-2003  itojun add hmac-sha2 support. various cleanups (like avoid hardcoding '16').
from kame
 1.19 20-Jul-2003  itojun avoid assuming result buffer size in AH logic. sync w/kame
 1.18 11-Sep-2002  itojun branches: 1.18.6;
correct pointer signedness mixups. sync w/kame
 1.17 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.16 30-May-2001  mrg branches: 1.16.2;
use _KERNEL_OPT
 1.15 19-Oct-2000  itojun branches: 1.15.2;
memcpy -> bcopy, for sync with kame tree
 1.14 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.13 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.12 23-Jul-2000  itojun pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
sync with kame.

loopback, blowfish-cbc transport mode, 128bit key
before: 86588496 bytes received in 00:42 (1.94 MB/s)
after: 86588496 bytes received in 00:31 (2.58 MB/s)
 1.11 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.10 14-Jun-2000  itojun branches: 1.10.2;
add algorithm name into algorithm table. (commit to crypto-intl will follow)
 1.9 02-Jun-2000  itojun sync with more recent kame. cope with malloc failure more gracefully
some cosmetics.
 1.8 31-Jan-2000  itojun branches: 1.8.2;
bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.7 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.6 02-Dec-1999  itojun avoid namespace polution ("#ifdef KERNEL" was mistakingly used)
 1.5 31-Jul-1999  itojun branches: 1.5.2; 1.5.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.4 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ah.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ah.h was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.10.2.2 30-Jul-2000  itojun pullup (approved by releng-1-5)

esp encryption performance improvement, specifically for algorithms
with long key setup time (blowfish). KAME PR 229.

> pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
> sync with kame.

1.11 -> 1.12 syssrc/sys/netinet6/ah.h
1.9 -> 1.10 syssrc/sys/netinet6/esp.h
1.2 -> 1.3 syssrc/sys/netinet6/esp_core.c \
1.2 -> 1.3 syssrc/sys/netinet6/esp_input.c
1.3 -> 1.4 syssrc/sys/netinet6/esp_output.c
1.27 -> 1.28 syssrc/sys/netkey/key.c
1.6 -> 1.7 syssrc/sys/netkey/keydb.h

> clarify comment. from jhawk. sync with kame.

1.3 -> 1.4 syssrc/sys/netinet6/esp_input.c
1.4 -> 1.5 syssrc/sys/netinet6/esp_output.c
 1.10.2.1 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.15.2.3 17-Sep-2002  nathanw Catch up to -current.
 1.15.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.15.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.16.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.6.4 11-Dec-2005  christos Sync with head.
 1.18.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.18.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.18.6.1 03-Aug-2004  skrll Sync with HEAD
 1.21.16.2 26-Feb-2007  yamt sync with head.
 1.21.16.1 21-Jun-2006  yamt sync with head.
 1.22.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.23.40.1 18-May-2008  yamt sync with head.
 1.23.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.25.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.25.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.25.2.1 04-May-2009  yamt sync with head.
 1.26.16.1 05-Apr-2012  mrg sync to latest -current.
 1.26.12.1 17-Apr-2012  yamt sync with head
 1.8 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.7 18-Apr-2009  tsutsui branches: 1.7.12; 1.7.16;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.6 18-Mar-2009  cegger bcopy -> memcpy
 1.5 19-Dec-2008  cegger branches: 1.5.2;
use M_ZERO on malloc() and remove subsequent bzero().
 1.4 23-May-2007  christos branches: 1.4.28; 1.4.32; 1.4.42;
Ansify + add a few comments, from Karl Sjödahl
 1.3 11-Dec-2005  christos branches: 1.3.30; 1.3.32;
merge ktrace-lwp.
 1.2 28-Jul-2005  christos PR/30821: SUZUKI, Shinsuike: IPsec-AH is always calculated using the same
key in AES-XCBC-MAC
 1.1 25-Jul-2003  itojun branches: 1.1.2; 1.1.4; 1.1.8; 1.1.16; 1.1.18;
AES XCBC MAC (for AH)
AES counter mode (for ESP)
 1.1.18.2 03-Sep-2007  yamt sync with head.
 1.1.18.1 21-Jun-2006  yamt sync with head.
 1.1.16.1 28-Jul-2005  jdc Pull up revision 1.2 (requested by elad in ticket #630).

PR/30821: SUZUKI, Shinsuike: IPsec-AH is always calculated using the
same key in AES-XCBC-MAC
 1.1.8.1 28-Jul-2005  jdc Pull up revision 1.2 (requested by elad in ticket #5538).

PR/30821: SUZUKI, Shinsuike: IPsec-AH is always calculated using the
same key in AES-XCBC-MAC
 1.1.4.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 25-Jul-2003  skrll file ah_aesxcbcmac.c was added on branch ktrace-lwp on 2004-08-03 10:55:11 +0000
 1.1.2.1 28-Jul-2005  jdc Pull up revision 1.2 (requested by elad in ticket #5538).

PR/30821: SUZUKI, Shinsuike: IPsec-AH is always calculated using the
same key in AES-XCBC-MAC
 1.3.32.1 11-Jul-2007  mjf Sync with head.
 1.3.30.1 08-Jun-2007  ad Sync with head.
 1.4.42.2 28-Apr-2009  skrll Sync with HEAD.
 1.4.42.1 19-Jan-2009  skrll Sync with HEAD.
 1.4.32.1 04-May-2009  yamt sync with head.
 1.4.28.1 17-Jan-2009  mjf Sync with HEAD.
 1.5.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.7.16.1 05-Apr-2012  mrg sync to latest -current.
 1.7.12.1 17-Apr-2012  yamt sync with head
 1.4 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.3 14-Mar-2009  dsl branches: 1.3.12; 1.3.16;
Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.2 10-Dec-2005  elad branches: 1.2.74; 1.2.84; 1.2.90;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 25-Jul-2003  itojun branches: 1.1.4; 1.1.18;
AES XCBC MAC (for AH)
AES counter mode (for ESP)
 1.1.18.1 21-Jun-2006  yamt sync with head.
 1.1.4.5 11-Dec-2005  christos Sync with head.
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 25-Jul-2003  skrll file ah_aesxcbcmac.h was added on branch ktrace-lwp on 2004-08-03 10:55:11 +0000
 1.2.90.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.2.84.1 28-Apr-2009  skrll Sync with HEAD.
 1.2.74.1 04-May-2009  yamt sync with head.
 1.3.16.1 05-Apr-2012  mrg sync to latest -current.
 1.3.12.1 17-Apr-2012  yamt sync with head
 1.49 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.48 18-Apr-2009  tsutsui branches: 1.48.12; 1.48.16;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.47 18-Mar-2009  cegger bcopy -> memcpy
 1.46 18-Mar-2009  cegger bzero -> memset
 1.45 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.44 23-May-2007  christos branches: 1.44.32; 1.44.42; 1.44.48;
Ansify + add a few comments, from Karl Sjödahl
 1.43 04-Mar-2007  christos branches: 1.43.2; 1.43.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.42 16-Nov-2006  christos branches: 1.42.4;
__unused removal on arguments; approved by core.
 1.41 27-Oct-2006  mrg what was <crypto/sha2/sha2.h> and <crypto/ripemd160/rmd160.h> is now
<sys/sha2.h> and <sys/rmd160.h>.
 1.40 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.39 21-Jan-2006  rpaulo branches: 1.39.18; 1.39.20;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.38 11-Dec-2005  christos branches: 1.38.2;
merge ktrace-lwp.
 1.37 21-Jul-2005  tron Remove unnecessary bzero() calls before calling the algorithm specific
init function.
 1.36 10-Mar-2004  itojun branches: 1.36.14; 1.36.16;
constify AH algorithm function table. suggested by robert watson
 1.35 25-Jul-2003  itojun minor KNF
 1.34 25-Jul-2003  itojun typo
 1.33 25-Jul-2003  itojun add AH/ESP algorithms: hmac-ripemd160 (AH), AES XCBC MAC (AH),
AES counter mode (ESP)
 1.32 22-Jul-2003  itojun unifdef -U_IP_VHL
 1.31 22-Jul-2003  itojun add hmac-sha2 support. various cleanups (like avoid hardcoding '16').
from kame
 1.30 20-Jul-2003  itojun avoid assuming result buffer size in AH logic. sync w/kame
 1.29 22-Apr-2003  itojun branches: 1.29.2;
style
 1.28 11-Sep-2002  itojun correct pointer signedness mixups. sync w/kame
 1.27 07-Jun-2002  itojun panic() if NULL is passed to ah_sumsiz_xx. suggested by sam leffler, sync w/kame
 1.26 13-Nov-2001  lukem branches: 1.26.8;
add RCSIDs
 1.25 29-Oct-2001  itojun always check extension header length.
 1.24 15-Oct-2001  itojun branches: 1.24.2;
reduce diff with kame. whitespace changes only.
 1.23 21-Feb-2001  itojun branches: 1.23.2; 1.23.4;
tighten AH IPv4 option chasing more. drop too short (< 2) option.
sync with kame.
 1.22 19-Feb-2001  itojun correct IPv4 option header chasing. the old code may overrun the buffer
if the option header is truncated. sync with kame
 1.21 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.20 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.19 14-Jun-2000  itojun branches: 1.19.2;
add algorithm name into algorithm table. (commit to crypto-intl will follow)
 1.18 02-Jun-2000  itojun sync with more recent kame. cope with malloc failure more gracefully
some cosmetics.
 1.17 21-Mar-2000  itojun branches: 1.17.2;
cleanup AH/policy processing.
- parse IPv6 header by using common function, ip6_{last,next}hdr.
- fix behaivior in multiple AH cases.
make strict boundary checks on mbuf chasing.
(sync with latest kame)
 1.16 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.15 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.14 16-Jan-2000  itojun add missing ipcomp cases.
 1.13 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.12 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.11 17-Sep-1999  itojun branches: 1.11.2; 1.11.8;
eliminate unnecessary splnet().
 1.10 26-Aug-1999  itojun sync with kame; typo in comment.
 1.9 25-Aug-1999  itojun fix AH computation for HbB options.
 1.8 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.7 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.6 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.5 06-Jul-1999  itojun fix IPSEC (but not INET6) build.

PR: 7921, 7922, 7924
From: rafal@mediaone.net
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ah_core.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ah_core.c was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.11.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.2.2 12-Mar-2001  bouyer Sync with HEAD.
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.19.2.3 26-Feb-2001  he Pull up revision 1.23 (via patch, requested by itojun):
Correct option parsing during AH checksum computation.
 1.19.2.2 26-Feb-2001  he Pull up revision 1.22 (via diff, requested by itojun):
Correct IPv4 option header chasing. The old code may overrun
the buffer if the option header is truncated.
 1.19.2.1 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.23.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.23.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.23.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.23.2.4 17-Sep-2002  nathanw Catch up to -current.
 1.23.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.23.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.23.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.24.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.26.8.1 20-Jun-2002  gehenna catch up with -current.
 1.29.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.29.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.29.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.29.2.1 03-Aug-2004  skrll Sync with HEAD
 1.36.16.3 03-Sep-2007  yamt sync with head.
 1.36.16.2 30-Dec-2006  yamt sync with head.
 1.36.16.1 21-Jun-2006  yamt sync with head.
 1.36.14.1 23-Jul-2005  riz Pull up revision 1.37 (requested by tron in ticket #611):
Remove unnecessary bzero() calls before calling the algorithm specific
init function.
 1.38.2.1 01-Feb-2006  yamt sync with head.
 1.39.20.2 10-Dec-2006  yamt sync with head.
 1.39.20.1 22-Oct-2006  yamt sync with head
 1.39.18.1 18-Nov-2006  ad Sync with head.
 1.42.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.43.4.1 11-Jul-2007  mjf Sync with head.
 1.43.2.1 08-Jun-2007  ad Sync with head.
 1.44.48.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.44.42.1 28-Apr-2009  skrll Sync with HEAD.
 1.44.32.1 04-May-2009  yamt sync with head.
 1.48.16.1 05-Apr-2012  mrg sync to latest -current.
 1.48.12.1 17-Apr-2012  yamt sync with head
 1.60 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.59 17-Jul-2011  joerg branches: 1.59.2; 1.59.6; 1.59.8; 1.59.12; 1.59.14;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.58 18-Mar-2009  cegger bcmp -> memcmp
 1.57 24-Apr-2008  ad branches: 1.57.2; 1.57.10; 1.57.16;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.56 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.55 19-Oct-2007  ad branches: 1.55.16; 1.55.18;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.54 23-May-2007  christos branches: 1.54.6; 1.54.8; 1.54.12;
Ansify + add a few comments, from Karl Sjödahl
 1.53 04-Mar-2007  christos branches: 1.53.2; 1.53.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.52 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.51 16-Nov-2006  christos branches: 1.51.4;
__unused removal on arguments; approved by core.
 1.50 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.49 11-Dec-2005  christos branches: 1.49.20; 1.49.22;
merge ktrace-lwp.
 1.48 07-Jul-2005  tron Defopt IPSEC_NAT_T.
 1.47 20-May-2005  manu branches: 1.47.2;
Use NAT-T ports for AH and IPcomp too.
 1.46 29-Apr-2005  yamt move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.45 23-Apr-2005  manu Enhance IPSEC_NAT_T so that it can work with multiple machines behind the
same NAT.
 1.44 11-Feb-2004  itojun branches: 1.44.8; 1.44.14;
KNF
 1.43 25-Oct-2003  christos fix uninitialized variables
 1.42 28-Sep-2003  mycroft Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."
 1.41 06-Aug-2003  itojun m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.
 1.40 22-Jul-2003  itojun unifdef -U_IP_VHL
 1.39 09-Jul-2003  itojun remove obsolete comment on the use of m_pullup
 1.38 14-May-2003  itojun branches: 1.38.2;
always use PULLDOWN_TEST codepath.
 1.37 11-Sep-2002  itojun correct pointer signedness mixups. sync w/kame
 1.36 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.35 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.34 09-Jun-2002  itojun whitespace cleanup
 1.33 29-May-2002  itojun avoid unneeded malloc/free. sync w/kame
 1.32 18-Mar-2002  itojun branches: 1.32.4; 1.32.6;
esp/ah_ctlinput: pass useful address to key_alloc.
 1.31 21-Dec-2001  itojun whitespace/costmetic sync w/kame
 1.30 21-Dec-2001  itojun remove obsolete #if 0'ed section. sync w/kame
 1.29 13-Nov-2001  lukem add RCSIDs
 1.28 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.27 13-Apr-2001  thorpej branches: 1.27.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.26 01-Mar-2001  itojun branches: 1.26.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.25 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.24 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.23 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.22 19-Oct-2000  itojun memcpy -> bcopy, for sync with kame tree
 1.21 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.20 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.19 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.18 16-Aug-2000  itojun add missing splx, when outgoing interface queue is full on tunnelled
ESP packet output. KAME PR 280.
 1.17 15-Aug-2000  thorpej Make this compile without INET6 again.
 1.16 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.15 02-Jun-2000  itojun branches: 1.15.2;
sync with more recent kame. cope with malloc failure more gracefully
some cosmetics.
 1.14 26-Mar-2000  mycroft branches: 1.14.2;
Oops; fix thinko.
 1.13 26-Mar-2000  mycroft Update byte count and time stamps for received packets (as in ESP).
May help fix stalls.
 1.12 21-Mar-2000  itojun cleanup AH/policy processing.
- parse IPv6 header by using common function, ip6_{last,next}hdr.
- fix behaivior in multiple AH cases.
make strict boundary checks on mbuf chasing.
(sync with latest kame)
 1.11 26-Feb-2000  itojun with IPv4 AH, strip off AH from the packet. this is to make some
of IPv4 transport layer code work correctly (specifically, ICMPv4
will transmit wrong packet if we don't strip AH here)

this is just for m_pulldown case. normal installations are not affected.
 1.10 25-Feb-2000  itojun make variable initialization safer.
(IP6_EXTHDR_CHECK can call m_pullup under rare condition)
 1.9 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.8 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.7 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.6 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.5 30-Jul-1999  itojun branches: 1.5.2;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 06-Jul-1999  itojun checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ah_input.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ah_input.c was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.5.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.5.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.5.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.15.2.7 09-Sep-2003  msaitoh Pull up revision 1.41 (requested by itojun in ticket #63):
m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.
 1.15.2.6 06-Apr-2001  he Pull up revision 1.24 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.15.2.5 11-Mar-2001  he Pull up revision 1.26 (requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.15.2.4 02-Oct-2000  itojun pullup (approved by releng-1-5)
correct ipsecstat/ipsec6stat mixup.

netinet6/ah_input.c 1.18 -> 1.19
netinet6/ah_output.c 1.11 -> 1.12 (part of)
netinet6/esp_input.c 1.8 -> 1.9 (part of)
netinet6/esp_output.c 1.8 -> 1.9
netinet6/icmp6.c 1.43 -> 1.44
netinet6/ipcomp_input.c 1.13 -> 1.14
netinet6/ipcomp_output.c 1.13 -> 1.14
 1.15.2.3 16-Aug-2000  itojun pullup (approved by releng-1-5)
> add missing splx, when outgoing interface queue is full on tunnelled
> IPsec packet output. KAME PR 280.
> cvs rdiff -r1.17 -r1.18 syssrc/sys/netinet6/ah_input.c
> cvs rdiff -r1.4 -r1.5 syssrc/sys/netinet6/esp_input.c
 1.15.2.2 15-Aug-2000  thorpej Pull up rev. 1.17:
Make this compile without INET6 again.
 1.15.2.1 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.26.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.26.2.7 27-Aug-2002  nathanw Catch up to -current.
 1.26.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.26.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.26.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.26.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.26.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.26.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.27.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.27.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.27.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.27.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.32.6.1 05-Sep-2003  tron Pull up revision 1.41 (requested by itojun in ticket #1401):
m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.
 1.32.4.3 29-Aug-2002  gehenna catch up with -current.
 1.32.4.2 20-Jun-2002  gehenna catch up with -current.
 1.32.4.1 30-May-2002  gehenna Catch up with -current.
 1.38.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.38.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.38.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.38.2.1 03-Aug-2004  skrll Sync with HEAD
 1.44.14.2 18-Jul-2005  riz Pull up revision 1.48 (requested by tron in ticket #565):
Defopt IPSEC_NAT_T.
 1.44.14.1 28-Apr-2005  tron Pull up revision 1.45 (requested by man in ticket #201):
Enhance IPSEC_NAT_T so that it can work with multiple machines behind
the same NAT.
 1.44.8.1 29-Apr-2005  kent sync with -current
 1.47.2.5 27-Oct-2007  yamt sync with head.
 1.47.2.4 03-Sep-2007  yamt sync with head.
 1.47.2.3 26-Feb-2007  yamt sync with head.
 1.47.2.2 30-Dec-2006  yamt sync with head.
 1.47.2.1 21-Jun-2006  yamt sync with head.
 1.49.22.2 10-Dec-2006  yamt sync with head.
 1.49.22.1 22-Oct-2006  yamt sync with head
 1.49.20.1 18-Nov-2006  ad Sync with head.
 1.51.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.51.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.53.4.1 11-Jul-2007  mjf Sync with head.
 1.53.2.2 23-Oct-2007  ad Sync with head.
 1.53.2.1 08-Jun-2007  ad Sync with head.
 1.54.12.1 25-Oct-2007  bouyer Sync with HEAD.
 1.54.8.1 06-Nov-2007  matt sync with HEAD
 1.54.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.55.18.1 18-May-2008  yamt sync with head.
 1.55.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.57.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.57.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.57.2.1 04-May-2009  yamt sync with head.
 1.59.14.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.59.12.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.59.8.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.59.6.1 05-Apr-2012  mrg sync to latest -current.
 1.59.2.1 17-Apr-2012  yamt sync with head
 1.34 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.33 18-Mar-2009  cegger branches: 1.33.12; 1.33.16;
bzero -> memset
 1.32 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.31 23-Apr-2008  thorpej branches: 1.31.2; 1.31.10; 1.31.16;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.30 22-Sep-2007  degroote branches: 1.30.20; 1.30.22;
{ah,esp,ipcomp}_output must return 0 on success. On failure, it returns the
error and m is freed. Previously, it was not the case in ipcomp and esp case
(aka in some case, it returns 0 with m freed, or an error and m was not freed).

In ipcomp_output, fix some leak of mcopy too.

Use the same error path in {ah,esp,ipcomp}_output.

Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.29 23-May-2007  christos branches: 1.29.6; 1.29.8;
Ansify + add a few comments, from Karl Sjödahl
 1.28 24-Nov-2006  christos branches: 1.28.2; 1.28.8; 1.28.10; 1.28.16;
fix spelling of accommodate; from Zapher.
 1.27 14-May-2006  christos branches: 1.27.8; 1.27.10;
XXX: GCC uninitialized.
 1.26 11-Dec-2005  christos branches: 1.26.4; 1.26.6; 1.26.8; 1.26.12;
merge ktrace-lwp.
 1.25 29-May-2005  christos branches: 1.25.2;
- avoid shadowed variables
- sprinkle const.
 1.24 07-Sep-2003  itojun branches: 1.24.14;
- prepare for RFC2401bis 64bit sequence number (no behavior change yet)
- use hash for SPI-based SAD entry lookup (should be faster, i hope)
- cleanup keydb.c and key.c. key.c is responsible for refcounting secasvar,
keydb.c is responsible for alloc/free.
 1.23 22-Aug-2003  itojun typo in log message
 1.22 22-Jul-2003  itojun unifdef -U_IP_VHL
 1.21 27-Sep-2002  provos branches: 1.21.6;
remove trailing \n in panic(). approved perry.
 1.20 11-Sep-2002  itojun correct pointer signedness mixups. sync w/kame
 1.19 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.18 09-Aug-2002  itojun avoid hardcoded "16" for max AH sum size. use AH_MAXSUMSIZE.
 1.17 09-Jun-2002  itojun whitespace cleanup
 1.16 13-Nov-2001  lukem branches: 1.16.8;
add RCSIDs
 1.15 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.14 21-Feb-2001  itojun branches: 1.14.2; 1.14.4;
tighten AH IPv4 option chasing more. drop too short (< 2) option.
sync with kame.
 1.13 19-Feb-2001  itojun correct IPv4 option handling.
 1.12 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.11 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.10 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.9 02-Jun-2000  itojun branches: 1.9.2;
sync with more recent kame. cope with malloc failure more gracefully
some cosmetics.
 1.8 21-Mar-2000  itojun branches: 1.8.2;
cleanup AH/policy processing.
- parse IPv6 header by using common function, ip6_{last,next}hdr.
- fix behaivior in multiple AH cases.
make strict boundary checks on mbuf chasing.
(sync with latest kame)
 1.7 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.6 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.5 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.4 30-Jul-1999  itojun branches: 1.4.2;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ah_output.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ah_output.c was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.4.2.2 12-Mar-2001  bouyer Sync with HEAD.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.9.2.4 26-Feb-2001  he Pull up revision 1.14 (via patch, requested by itojun):
Correct option parsing during AH checksum computation.
 1.9.2.3 26-Feb-2001  he Pull up revision 1.13 (requested by itojun):
Correct IPv4 option header chasing. The old code may overrun
the buffer if the option header is truncated.
 1.9.2.2 02-Oct-2000  itojun pullup (approved by releng-1-5)
correct ipsecstat/ipsec6stat mixup.

netinet6/ah_input.c 1.18 -> 1.19
netinet6/ah_output.c 1.11 -> 1.12 (part of)
netinet6/esp_input.c 1.8 -> 1.9 (part of)
netinet6/esp_output.c 1.8 -> 1.9
netinet6/icmp6.c 1.43 -> 1.44
netinet6/ipcomp_input.c 1.13 -> 1.14
netinet6/ipcomp_output.c 1.13 -> 1.14
 1.9.2.1 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.14.4.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.14.4.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.14.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.14.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.14.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.14.2.5 17-Sep-2002  nathanw Catch up to -current.
 1.14.2.4 13-Aug-2002  nathanw Catch up to -current.
 1.14.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.14.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.14.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.16.8.2 29-Aug-2002  gehenna catch up with -current.
 1.16.8.1 20-Jun-2002  gehenna catch up with -current.
 1.21.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.21.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.21.6.1 03-Aug-2004  skrll Sync with HEAD
 1.24.14.1 23-Sep-2007  bouyer Pull up following revision(s) (requested by degroote in ticket #1846):
sys/netinet6/ipcomp_output.c: revision 1.22
sys/netinet6/ah_output.c: revision 1.30
sys/netinet6/esp_output.c: revision 1.30
Fix some possible mbuf leak in kame ipsec code.
Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.25.2.4 27-Oct-2007  yamt sync with head.
 1.25.2.3 03-Sep-2007  yamt sync with head.
 1.25.2.2 30-Dec-2006  yamt sync with head.
 1.25.2.1 21-Jun-2006  yamt sync with head.
 1.26.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.26.8.1 24-May-2006  yamt sync with head.
 1.26.6.1 01-Jun-2006  kardel Sync with head.
 1.26.4.1 09-Sep-2006  rpaulo sync with head
 1.27.10.1 10-Dec-2006  yamt sync with head.
 1.27.8.1 12-Jan-2007  ad Sync with head.
 1.28.16.1 30-Sep-2007  wrstuden Catch up on netbsd-4 as of a few days ago.
 1.28.10.1 11-Jul-2007  mjf Sync with head.
 1.28.8.2 09-Oct-2007  ad Sync with head.
 1.28.8.1 08-Jun-2007  ad Sync with head.
 1.28.2.1 25-Sep-2007  xtraeme Pull up following revision(s) (requested by degroote in ticket #896):
sys/netinet6/ipcomp_output.c: revision 1.22
sys/netinet6/ah_output.c: revision 1.30
sys/netinet6/esp_output.c: revision 1.30

{ah,esp,ipcomp}_output must return 0 on success. On failure, it returns the
error and m is freed. Previously, it was not the case in ipcomp and esp case
(aka in some case, it returns 0 with m freed, or an error and m was not freed).

In ipcomp_output, fix some leak of mcopy too.

Use the same error path in {ah,esp,ipcomp}_output.

Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.29.8.1 06-Nov-2007  matt sync with HEAD
 1.29.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.30.22.1 18-May-2008  yamt sync with head.
 1.30.20.1 02-Jun-2008  mjf Sync with HEAD.
 1.31.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.31.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.31.2.1 04-May-2009  yamt sync with head.
 1.33.16.1 05-Apr-2012  mrg sync to latest -current.
 1.33.12.1 17-Apr-2012  yamt sync with head
 1.15 04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.14 28-Oct-2022  ozaki-r Adjust dccp and sctp for struct inpcb separation
 1.13 28-Oct-2022  ozaki-r Adjust pf, wg, dccp and sctp for struct inpcb integration
 1.12 15-Sep-2018  rjs Make it compile after change to non-variadic pr_input.
 1.11 24-Jan-2017  ozaki-r branches: 1.11.12; 1.11.14; 1.11.16;
Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.10 13-Dec-2016  ozaki-r branches: 1.10.2;
Remove unnecessary inclusions of nd6.h
 1.9 18-Nov-2016  knakahara fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.8 26-Apr-2016  ozaki-r branches: 1.8.2;
Sweep unnecessary route.h inclusions
 1.7 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.6 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.5 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.4 25-Apr-2015  rtr fix missed parameter type change in dccp6_accept() to sockaddr * from mbuf *
 1.3 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.2 04-Apr-2015  rtr branches: 1.2.2;
* update dccp_bind for struct mbuf * to struct sockaddr * parameter change
* pass NULL instead of casting 0 to a pointer when calling in_pcbbind()
 1.1 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.2.2.7 05-Feb-2017  skrll Sync with HEAD
 1.2.2.6 05-Dec-2016  skrll Sync with HEAD
 1.2.2.5 29-May-2016  skrll Sync with HEAD
 1.2.2.4 22-Sep-2015  skrll Sync with HEAD
 1.2.2.3 06-Jun-2015  skrll Sync with HEAD
 1.2.2.2 06-Apr-2015  skrll Sync with HEAD
 1.2.2.1 04-Apr-2015  skrll file dccp6_usrreq.c was added on branch nick-nhusb on 2015-04-06 15:18:23 +0000
 1.8.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.8.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.10.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.16.1 10-Jun-2019  christos Sync with HEAD
 1.11.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.11.12.2 03-Dec-2017  jdolecek update from HEAD
 1.11.12.1 24-Jan-2017  jdolecek file dccp6_usrreq.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.4 02-May-2015  rtr branches: 1.4.16;
make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.3 25-Apr-2015  rtr fix missed parameter type change in dccp6_accept() to sockaddr * from mbuf *
 1.2 04-Apr-2015  rtr branches: 1.2.2;
* update dccp_bind for struct mbuf * to struct sockaddr * parameter change
* pass NULL instead of casting 0 to a pointer when calling in_pcbbind()
 1.1 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.2.2.3 06-Jun-2015  skrll Sync with HEAD
 1.2.2.2 06-Apr-2015  skrll Sync with HEAD
 1.2.2.1 04-Apr-2015  skrll file dccp6_var.h was added on branch nick-nhusb on 2015-04-06 15:18:23 +0000
 1.4.16.2 03-Dec-2017  jdolecek update from HEAD
 1.4.16.1 02-May-2015  jdolecek file dccp6_var.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.22 13-Apr-2018  maxv style
 1.21 23-Jan-2018  maxv branches: 1.21.2;
Fix the calculation of the ICMP6 error pointer. It is not correct to use

pointer = opt - mtod(m, u_int8_t *)

because m may have gone through m_pulldown, and it is possible that
m->m_data is no longer the beginning of the packet.
 1.20 11-Jan-2017  ozaki-r branches: 1.20.8;
Get rid of unnecessary header inclusions
 1.19 26-Apr-2016  ozaki-r branches: 1.19.2;
Sweep unnecessary route.h inclusions
 1.18 14-Nov-2014  maxv branches: 1.18.2;
Do not uselessly include <sys/malloc.h>.
 1.17 15-Apr-2008  thorpej branches: 1.17.48; 1.17.66;
Make ip6 and icmp6 stats per-cpu.
 1.16 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.15 16-Nov-2006  christos branches: 1.15.44; 1.15.48;
__unused removal on arguments; approved by core.
 1.14 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.13 26-Jan-2006  rpaulo branches: 1.13.18; 1.13.20;
<netinet6/in6_pcb.h> is not needed.
 1.12 14-May-2003  itojun branches: 1.12.18; 1.12.30;
always use PULLDOWN_TEST codepath.
 1.11 13-Nov-2001  lukem add RCSIDs
 1.10 22-Feb-2001  itojun branches: 1.10.2; 1.10.4;
be more more picky about option length parsing. sync with kame
 1.9 21-Feb-2001  itojun make validation code more strict for ND6/dest6 variable length headers.
check duplicated nd6_ifinfo table initialization in a better way.
sync with kame
 1.8 23-Jan-2001  itojun minimize diff with the latest kame tree.
 1.7 06-Feb-2000  itojun branches: 1.7.4;
fix include pathname for better rfc2292 compliance.
 1.6 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.5 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.4 30-Jul-1999  itojun branches: 1.4.2; 1.4.8;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file dest6.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file dest6.c was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.4.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.4.1 26-Feb-2001  he Pull up revisions 1.9-1.10 (via patch, requested by itojun):
Tighten IPv6 ND6/dest6 option chasing bounds check.
 1.10.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.12.30.1 01-Feb-2006  yamt sync with head.
 1.12.18.2 30-Dec-2006  yamt sync with head.
 1.12.18.1 21-Jun-2006  yamt sync with head.
 1.13.20.2 10-Dec-2006  yamt sync with head.
 1.13.20.1 22-Oct-2006  yamt sync with head
 1.13.18.1 18-Nov-2006  ad Sync with head.
 1.15.48.1 02-Jun-2008  mjf Sync with HEAD.
 1.15.44.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.17.66.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.17.48.1 03-Dec-2017  jdolecek update from HEAD
 1.18.2.2 05-Feb-2017  skrll Sync with HEAD
 1.18.2.1 29-May-2016  skrll Sync with HEAD
 1.19.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.20.8.1 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #664):

sys/netinet6/dest6.c: revision 1.21

Fix the calculation of the ICMP6 error pointer. It is not correct to use

pointer = opt - mtod(m, u_int8_t *)

because m may have gone through m_pulldown, and it is possible that
m->m_data is no longer the beginning of the packet.
 1.21.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.27 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.26 14-Mar-2009  dsl branches: 1.26.12; 1.26.16;
Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.25 24-Apr-2008  ad branches: 1.25.2; 1.25.10; 1.25.16;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.24 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.23 17-Feb-2007  dyoung branches: 1.23.38; 1.23.40;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.22 10-Dec-2005  elad branches: 1.22.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.21 20-Jul-2003  itojun branches: 1.21.16;
change ESP xx_schedlen() return type to size_t. sync w/kame
 1.20 09-Aug-2002  itojun branches: 1.20.6;
use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>
 1.19 09-Aug-2002  itojun cut and paste error in comment. From: Arto Selonen <arto@selonen.org>
 1.18 15-Oct-2001  itojun branches: 1.18.10; 1.18.12;
reduce diff with kame. whitespace changes only.
 1.17 30-May-2001  mrg branches: 1.17.2;
use _KERNEL_OPT
 1.16 19-Oct-2000  itojun branches: 1.16.2;
memcpy -> bcopy, for sync with kame tree
 1.15 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.14 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.13 26-Sep-2000  itojun do not hardcode maximum IV length.
 1.12 29-Aug-2000  itojun use per-block cipher function + esp_cbc_{de,en}crypt. do not use
cbc-over-mbuf functions in sys/crypto.

the change should make it much easier to switch crypto function to
machine-dependent ones (like assembly code under sys/arch/i386/crypto?).
also it should be much easier to import AES algorithms.

XXX: it looks that past blowfish-cbc code was buggy. i ran some test pattern,
and new blowfish-cbc code looks more correct. there's no interoperability
between the old code (before the commit) and the new code (after the commit).

XXX: need serious interop tests before move it into 1.5 branch
 1.11 23-Jul-2000  itojun wrap kernel function prototype by #ifdef _KERNEL.
 1.10 23-Jul-2000  itojun pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
sync with kame.

loopback, blowfish-cbc transport mode, 128bit key
before: 86588496 bytes received in 00:42 (1.94 MB/s)
after: 86588496 bytes received in 00:31 (2.58 MB/s)
 1.9 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.8 14-Jun-2000  itojun branches: 1.8.2;
add algorithm name into algorithm table. (commit to crypto-intl will follow)
 1.7 31-Jan-2000  itojun branches: 1.7.2;
bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.6 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.5 02-Dec-1999  itojun avoid namespace polution ("#ifdef KERNEL" was mistakingly used)
 1.4 31-Jul-1999  itojun branches: 1.4.2; 1.4.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.3 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.2 03-Jul-1999  thorpej branches: 1.2.2;
RCS ID police.
 1.1 01-Jul-1999  itojun branches: 1.1.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.1.2.1 01-Jul-1999  thorpej file esp.h was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.2.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.2.2.1 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 19990705 (forgot to add)
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.8.2.4 04-Sep-2002  itojun sys/netinet6/esp.h 1.20
sys/netinet6/esp_core.c 1.24
sys/netinet6/esp_output.c 1.14
use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>

(itojun)
 1.8.2.3 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.8.2.2 30-Jul-2000  itojun pullup (approved by releng-1-5)

esp encryption performance improvement, specifically for algorithms
with long key setup time (blowfish). KAME PR 229.

> pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
> sync with kame.

1.11 -> 1.12 syssrc/sys/netinet6/ah.h
1.9 -> 1.10 syssrc/sys/netinet6/esp.h
1.2 -> 1.3 syssrc/sys/netinet6/esp_core.c \
1.2 -> 1.3 syssrc/sys/netinet6/esp_input.c
1.3 -> 1.4 syssrc/sys/netinet6/esp_output.c
1.27 -> 1.28 syssrc/sys/netkey/key.c
1.6 -> 1.7 syssrc/sys/netkey/keydb.h

> clarify comment. from jhawk. sync with kame.

1.3 -> 1.4 syssrc/sys/netinet6/esp_input.c
1.4 -> 1.5 syssrc/sys/netinet6/esp_output.c
 1.8.2.1 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.16.2.3 13-Aug-2002  nathanw Catch up to -current.
 1.16.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.16.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.17.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.12.1 09-Aug-2002  lukem Pull up revision 1.20 (requested by itojun in ticket #659):
use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>
 1.18.10.1 29-Aug-2002  gehenna catch up with -current.
 1.20.6.4 11-Dec-2005  christos Sync with head.
 1.20.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.20.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.20.6.1 03-Aug-2004  skrll Sync with HEAD
 1.21.16.2 26-Feb-2007  yamt sync with head.
 1.21.16.1 21-Jun-2006  yamt sync with head.
 1.22.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.23.40.1 18-May-2008  yamt sync with head.
 1.23.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.25.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.25.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.25.2.1 04-May-2009  yamt sync with head.
 1.26.16.1 05-Apr-2012  mrg sync to latest -current.
 1.26.12.1 17-Apr-2012  yamt sync with head
 1.14 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.13 14-Aug-2010  jym branches: 1.13.8; 1.13.12;
Fix some code paths where pointers are dereferenced after checking that
they are NULL (oops?)

XXX pull-ups for NetBSD-4 and NetBSD-5.
 1.12 18-Apr-2009  tsutsui branches: 1.12.2; 1.12.4;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.11 19-Mar-2009  he Correct two more bungled bcopy() -> memcpy() conversions.
 1.10 18-Mar-2009  cegger bcopy -> memcpy
 1.9 18-Mar-2009  cegger bzero -> memset
 1.8 25-Dec-2007  perry branches: 1.8.10; 1.8.18; 1.8.20; 1.8.24;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.7 23-May-2007  christos branches: 1.7.8; 1.7.14; 1.7.16; 1.7.20;
Ansify + add a few comments, from Karl Sjödahl
 1.6 04-Mar-2007  christos branches: 1.6.2; 1.6.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.5 16-Nov-2006  christos branches: 1.5.2; 1.5.4;
__unused removal on arguments; approved by core.
 1.4 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.3 11-Dec-2005  christos branches: 1.3.20; 1.3.22;
merge ktrace-lwp.
 1.2 22-Apr-2005  itojun branches: 1.2.2;
AES counter mode uses 8byte IV, not 16 bytes.
msa@burp.tkv.asdf.org, Juha.Leppilahti@iki.fi
 1.1 25-Jul-2003  itojun branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10; 1.1.16;
AES XCBC MAC (for AH)
AES counter mode (for ESP)
 1.1.16.1 01-Oct-2005  tron Pull up following revision(s) (requested by kleink in ticket #837):
sys/netinet6/esp_aesctr.c: revision 1.2
AES counter mode uses 8byte IV, not 16 bytes.
msa@burp.tkv.asdf.org, Juha.Leppilahti@iki.fi
 1.1.10.1 29-Apr-2005  kent sync with -current
 1.1.8.1 11-Oct-2005  riz Pull up following revision(s) (requested by kleink in ticket #5899):
sys/netinet6/esp_aesctr.c: revision 1.2
AES counter mode uses 8byte IV, not 16 bytes.
msa@burp.tkv.asdf.org, Juha.Leppilahti@iki.fi
 1.1.4.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 25-Jul-2003  skrll file esp_aesctr.c was added on branch ktrace-lwp on 2004-08-03 10:55:11 +0000
 1.1.2.1 11-Oct-2005  riz Pull up following revision(s) (requested by kleink in ticket #5899):
sys/netinet6/esp_aesctr.c: revision 1.2
AES counter mode uses 8byte IV, not 16 bytes.
msa@burp.tkv.asdf.org, Juha.Leppilahti@iki.fi
 1.2.2.3 21-Jan-2008  yamt sync with head
 1.2.2.2 03-Sep-2007  yamt sync with head.
 1.2.2.1 30-Dec-2006  yamt sync with head.
 1.3.22.2 10-Dec-2006  yamt sync with head.
 1.3.22.1 22-Oct-2006  yamt sync with head
 1.3.20.1 18-Nov-2006  ad Sync with head.
 1.5.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.5.2.1 12-Sep-2010  bouyer Pull up following revision(s) (requested by jym in ticket #1403):
sys/netinet6/esp_core.c: revision 1.46
sys/netinet6/esp_aesctr.c: revision 1.13
Fix some code paths where pointers are dereferenced after checking that
they are NULL (oops?)
 1.6.4.1 11-Jul-2007  mjf Sync with head.
 1.6.2.1 08-Jun-2007  ad Sync with head.
 1.7.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.7.16.1 26-Dec-2007  ad Sync with head.
 1.7.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.7.8.1 09-Jan-2008  matt sync with HEAD
 1.8.24.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.8.20.1 21-Nov-2010  riz Pull up following revision(s) (requested by jym in ticket #1440):
sys/netinet6/esp_core.c: revision 1.46
sys/netinet6/esp_aesctr.c: revision 1.13
Fix some code paths where pointers are dereferenced after checking that
they are NULL (oops?)
XXX pull-ups for NetBSD-4 and NetBSD-5.
 1.8.18.1 28-Apr-2009  skrll Sync with HEAD.
 1.8.10.2 09-Oct-2010  yamt sync with head
 1.8.10.1 04-May-2009  yamt sync with head.
 1.12.4.1 05-Mar-2011  rmind sync with head
 1.12.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.13.12.1 05-Apr-2012  mrg sync to latest -current.
 1.13.8.1 17-Apr-2012  yamt sync with head
 1.4 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.3 14-Mar-2009  dsl branches: 1.3.12; 1.3.16;
Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.2 10-Dec-2005  elad branches: 1.2.74; 1.2.84; 1.2.90;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 25-Jul-2003  itojun branches: 1.1.4; 1.1.18;
AES XCBC MAC (for AH)
AES counter mode (for ESP)
 1.1.18.1 21-Jun-2006  yamt sync with head.
 1.1.4.5 11-Dec-2005  christos Sync with head.
 1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.4.1 25-Jul-2003  skrll file esp_aesctr.h was added on branch ktrace-lwp on 2004-08-03 10:55:11 +0000
 1.2.90.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.2.84.1 28-Apr-2009  skrll Sync with HEAD.
 1.2.74.1 04-May-2009  yamt sync with head.
 1.3.16.1 05-Apr-2012  mrg sync to latest -current.
 1.3.12.1 17-Apr-2012  yamt sync with head
 1.47 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.46 14-Aug-2010  jym branches: 1.46.8; 1.46.12;
Fix some code paths where pointers are dereferenced after checking that
they are NULL (oops?)

XXX pull-ups for NetBSD-4 and NetBSD-5.
 1.45 18-Apr-2009  tsutsui branches: 1.45.2; 1.45.4;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.44 18-Mar-2009  cegger bcopy -> memcpy
 1.43 18-Mar-2009  cegger bzero -> memset
 1.42 18-Mar-2009  cegger Ansify function definitions w/o arguments. Generated with sed.
 1.41 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.40 23-May-2007  christos branches: 1.40.32; 1.40.42; 1.40.44; 1.40.48;
Ansify + add a few comments, from Karl Sjödahl
 1.39 04-Mar-2007  christos branches: 1.39.2; 1.39.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.38 16-Nov-2006  christos branches: 1.38.2; 1.38.4;
__unused removal on arguments; approved by core.
 1.37 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.36 30-Aug-2006  christos branches: 1.36.2; 1.36.4;
fix incomplete initializer
 1.35 11-Dec-2005  christos branches: 1.35.4; 1.35.8;
merge ktrace-lwp.
 1.34 18-Aug-2005  tron Remove write-only variable "derived" in esp_cbc_encrypt().
 1.33 27-Aug-2003  thorpej branches: 1.33.14; 1.33.16;
Use BF_ecb_encrypt() instead of using BF_encrypt()/BF_decrypt()
directly. Reviewed by itojun.
 1.32 26-Aug-2003  thorpej Move the opencrypto CAST-128 implementation to crypto/cast128, removing
the old one. Rename the functions/structures from cast_* to cast128_*.
Adapt the KAME IPsec to use the new CAST-128 code, which has a simpler
API and smaller footprint.
 1.31 25-Jul-2003  itojun add AH/ESP algorithms: hmac-ripemd160 (AH), AES XCBC MAC (AH),
AES counter mode (ESP)
 1.30 22-Jul-2003  itojun clear scheduled key before freeing, for safety
 1.29 22-Jul-2003  itojun cosmetic
 1.28 20-Jul-2003  itojun avoid assuming result buffer size in AH logic. sync w/kame
 1.27 20-Jul-2003  itojun due to previous type change, sav->schedlen never go negative. sync w/kame
 1.26 20-Jul-2003  itojun change ESP xx_schedlen() return type to size_t. sync w/kame
 1.25 11-Sep-2002  itojun branches: 1.25.6;
correct pointer signedness mixups. sync w/kame
 1.24 09-Aug-2002  itojun use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>
 1.23 09-Jun-2002  itojun whitespace cleanup
 1.22 08-Jun-2002  itojun whitespace cleanup
 1.21 27-Feb-2002  itojun branches: 1.21.8; 1.21.10;
sync blowfish function prototype between i386 assembly and C.
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
 1.20 21-Dec-2001  itojun whitespace/costmetic sync w/kame
 1.19 27-Nov-2001  itojun fix cast128 with shorter key length. sync with kame
 1.18 13-Nov-2001  lukem add RCSIDs
 1.17 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.16 10-Sep-2001  itojun minor style
 1.15 09-Sep-2001  tls Add asm versions of blowfish and des transforms for i386.

This also involved updating the in-kernel DES functions to correspond
to the versions in our in-tree OpenSSL, because the des_SPtrans table
has changed; the asm code will not work with the old permutation table!

C and i386 asm code for the DES, 3DES, and Blowfish CBC modes is also
included; it is not currently built as the ESP processing in esp_core.c
splits the CBC operation and the cipher transform apart. Hopefully that
will be fixed as there is a substantial performance improvement to be had
from doing so. It will remain necessary to use the C version of the
Blowfish CBC function on some i386 machines, however, as the asm version
uses bswapl, which ony 486 and later processors have. The DES CBC code
doesn't have this problem.

Finally, change esp_core.c to use the ecb3_encrypt function instead of
calling ecb_encrypt three times; this improves performance a bit, in
particular in the asm case.
 1.14 02-Nov-2000  itojun branches: 1.14.2; 1.14.4; 1.14.6; 1.14.8;
avoid possible align issue
 1.13 02-Nov-2000  itojun [13]des fix for big endian machines. from: shigeru@iij.ad.jp
 1.12 05-Oct-2000  itojun always use rnd(4) for IPsec random number source. avoid random(9).
if there's no rnd(4), random(9) will be used with one-time warning printf(9).

XXX not sure how good rnd_extract_data(RND_EXTRACT_ANY) is, under entropy-
starvation situation
 1.11 02-Oct-2000  itojun correct merge failure in key size validation.
 1.10 02-Oct-2000  itojun add ESP rijndael logic. yet to be usable (until algorithm # is assigned)
 1.9 26-Sep-2000  itojun do not hardcode maximum IV length.
 1.8 18-Sep-2000  itojun repair blowfish-cbc. BF_encrypt() takes value in host byteorder, yuck!
(no effect to 1.5 branch)
 1.7 31-Aug-2000  itojun repair DES on LP64. past code did not interoperate with non-LP64, due to
incorrect computed results.
remove unnecessary #ifdef/#define. sync with kame.
 1.6 30-Aug-2000  itojun LP64 fix (cast to u_long when printing size_t)
 1.5 29-Aug-2000  itojun improve code sharing for esp_schedule(). add some diagnostics cases
for esp_cbc_{en,de}crypt(). sync with kame.
 1.4 29-Aug-2000  itojun use per-block cipher function + esp_cbc_{de,en}crypt. do not use
cbc-over-mbuf functions in sys/crypto.

the change should make it much easier to switch crypto function to
machine-dependent ones (like assembly code under sys/arch/i386/crypto?).
also it should be much easier to import AES algorithms.

XXX: it looks that past blowfish-cbc code was buggy. i ran some test pattern,
and new blowfish-cbc code looks more correct. there's no interoperability
between the old code (before the commit) and the new code (after the commit).

XXX: need serious interop tests before move it into 1.5 branch
 1.3 23-Jul-2000  itojun pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
sync with kame.

loopback, blowfish-cbc transport mode, 128bit key
before: 86588496 bytes received in 00:42 (1.94 MB/s)
after: 86588496 bytes received in 00:31 (2.58 MB/s)
 1.2 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.1 14-Jun-2000  thorpej branches: 1.1.1;
Initial revision
 1.1.1.1 14-Jun-2000  thorpej branches: 1.1.1.1.2; 1.1.1.1.4;
Import IPsec ESP from netbsd-cryptosrc-intl.
 1.1.1.1.4.2 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.1.1.1.4.1 14-Jun-2000  minoura file esp_core.c was added on branch minoura-xpg4dl on 2000-06-22 17:09:54 +0000
 1.1.1.1.2.10 05-Aug-2003  msaitoh pull up revision 1.30 (requested by itojun in ticket #58):
clear scheduled key before freeing, for safety
 1.1.1.1.2.9 04-Sep-2002  itojun sys/netinet6/esp.h 1.20
sys/netinet6/esp_core.c 1.24
sys/netinet6/esp_output.c 1.14
use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>

(itojun)
 1.1.1.1.2.8 09-Dec-2001  he Pull up revision 1.19 (requested by itojun):
Fix cast128 with short keys.
 1.1.1.1.2.7 03-Nov-2000  tv Pullup 1.13 and 1.14 [itojun]:
[13]des fix for big endian machines. from: shigeru@iij.ad.jp
avoid possible align issue
 1.1.1.1.2.6 05-Oct-2000  itojun pullup (approved by releng-1-5)

always use rnd(4) for IPsec random number source. avoid random(9).
if there's no rnd(4), random(9) will be used with one-time warning printf(9).

XXX not sure how good rnd_extract_data(RND_EXTRACT_ANY) is, under entropy-
starvation situation

cvs rdiff -r1.11 -r1.12 syssrc/sys/netinet6/esp_core.c
cvs rdiff -r1.9 -r1.10 syssrc/sys/netinet6/esp_output.c
cvs rdiff -r1.38 -r1.39 syssrc/sys/netkey/key.c
cvs rdiff -r1.6 -r1.7 syssrc/sys/netkey/key.h
 1.1.1.1.2.5 04-Oct-2000  itojun pullup (approved by releng-1-5)
rijndael-cbc kernel support.

sys/crypto/rijndael/* add tag for latest
sys/netinet6/esp_rijndael.[ch] add tag for latest
sys/netinet6/esp_core.c 1.9 -> 1.11
sys/conf/files 1.389 -> 1.390, 1.395 -> 1.396
sys/net/pfkeyv2.h 1.7 -> 1.11
 1.1.1.1.2.4 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.1.1.1.2.3 31-Aug-2000  itojun pullup (approved by releng-1-5)

> repair DES on LP64. past code did not interoperate with non-LP64, due to
> incorrect computed results.
> remove unnecessary #ifdef/#define. sync with kame.

> cvs rdiff -r1.1 -r1.2 syssrc/sys/crypto/des/des.h \
> syssrc/sys/crypto/des/des_3cbc.c syssrc/sys/crypto/des/des_cbc.c \
> syssrc/sys/crypto/des/des_ecb.c syssrc/sys/crypto/des/des_locl.h \
> syssrc/sys/crypto/des/des_setkey.c
> cvs rdiff -r1.6 -r1.7 syssrc/sys/netinet6/esp_core.c (equivalent change)
 1.1.1.1.2.2 30-Jul-2000  itojun pullup (approved by releng-1-5)

esp encryption performance improvement, specifically for algorithms
with long key setup time (blowfish). KAME PR 229.

> pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
> sync with kame.

1.11 -> 1.12 syssrc/sys/netinet6/ah.h
1.9 -> 1.10 syssrc/sys/netinet6/esp.h
1.2 -> 1.3 syssrc/sys/netinet6/esp_core.c \
1.2 -> 1.3 syssrc/sys/netinet6/esp_input.c
1.3 -> 1.4 syssrc/sys/netinet6/esp_output.c
1.27 -> 1.28 syssrc/sys/netkey/key.c
1.6 -> 1.7 syssrc/sys/netkey/keydb.h

> clarify comment. from jhawk. sync with kame.

1.3 -> 1.4 syssrc/sys/netinet6/esp_input.c
1.4 -> 1.5 syssrc/sys/netinet6/esp_output.c
 1.1.1.1.2.1 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.14.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.14.6.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.14.6.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.14.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.14.6.3 16-Mar-2002  jdolecek Catch up with -current.
 1.14.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.14.6.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.14.4.8 17-Sep-2002  nathanw Catch up to -current.
 1.14.4.7 13-Aug-2002  nathanw Catch up to -current.
 1.14.4.6 20-Jun-2002  nathanw Catch up to -current.
 1.14.4.5 28-Feb-2002  nathanw Catch up to -current.
 1.14.4.4 08-Jan-2002  nathanw Catch up to -current.
 1.14.4.3 14-Nov-2001  nathanw Catch up to -current.
 1.14.4.2 22-Oct-2001  nathanw Catch up to -current.
 1.14.4.1 21-Sep-2001  nathanw Catch up to -current.
 1.14.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.14.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.2.1 02-Nov-2000  bouyer file esp_core.c was added on branch thorpej_scsipi on 2000-11-20 18:10:43 +0000
 1.21.10.2 11-Aug-2003  msaitoh Pull up rev. 1.30 (requested by itojun in ticket #1383):
Clear scheduled key before freeing, for safety
 1.21.10.1 09-Aug-2002  lukem Pull up revision 1.24 (requested by itojun in ticket #659):
use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>
 1.21.8.2 29-Aug-2002  gehenna catch up with -current.
 1.21.8.1 20-Jun-2002  gehenna catch up with -current.
 1.25.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.25.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.25.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.25.6.1 03-Aug-2004  skrll Sync with HEAD
 1.33.16.3 03-Sep-2007  yamt sync with head.
 1.33.16.2 30-Dec-2006  yamt sync with head.
 1.33.16.1 21-Jun-2006  yamt sync with head.
 1.33.14.1 24-Aug-2005  riz Pull up following revision(s) (requested by tron in ticket #687):
sys/netinet6/esp_core.c: revision 1.34
Remove write-only variable "derived" in esp_cbc_encrypt().
 1.35.8.1 03-Sep-2006  yamt sync with head.
 1.35.4.1 09-Sep-2006  rpaulo sync with head
 1.36.4.2 10-Dec-2006  yamt sync with head.
 1.36.4.1 22-Oct-2006  yamt sync with head
 1.36.2.1 18-Nov-2006  ad Sync with head.
 1.38.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.38.2.1 12-Sep-2010  bouyer Pull up following revision(s) (requested by jym in ticket #1403):
sys/netinet6/esp_core.c: revision 1.46
sys/netinet6/esp_aesctr.c: revision 1.13
Fix some code paths where pointers are dereferenced after checking that
they are NULL (oops?)
 1.39.4.1 11-Jul-2007  mjf Sync with head.
 1.39.2.1 08-Jun-2007  ad Sync with head.
 1.40.48.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.40.44.1 21-Nov-2010  riz Pull up following revision(s) (requested by jym in ticket #1440):
sys/netinet6/esp_core.c: revision 1.46
sys/netinet6/esp_aesctr.c: revision 1.13
Fix some code paths where pointers are dereferenced after checking that
they are NULL (oops?)
XXX pull-ups for NetBSD-4 and NetBSD-5.
 1.40.42.1 28-Apr-2009  skrll Sync with HEAD.
 1.40.32.2 09-Oct-2010  yamt sync with head
 1.40.32.1 04-May-2009  yamt sync with head.
 1.45.4.1 05-Mar-2011  rmind sync with head
 1.45.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.46.12.1 05-Apr-2012  mrg sync to latest -current.
 1.46.8.1 17-Apr-2012  yamt sync with head
 1.51 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.50 17-Jul-2011  joerg branches: 1.50.2; 1.50.6; 1.50.8; 1.50.12; 1.50.14;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.49 18-Mar-2009  cegger bzero -> memset
 1.48 18-Mar-2009  cegger bcmp -> memcmp
 1.47 24-Apr-2008  ad branches: 1.47.2; 1.47.10; 1.47.16;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.46 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.45 19-Oct-2007  ad branches: 1.45.16; 1.45.18;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.44 23-May-2007  christos branches: 1.44.6; 1.44.8; 1.44.12;
Ansify + add a few comments, from Karl Sjödahl
 1.43 04-Mar-2007  christos branches: 1.43.2; 1.43.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.42 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.41 16-Nov-2006  christos branches: 1.41.4;
__unused removal on arguments; approved by core.
 1.40 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.39 11-Dec-2005  christos branches: 1.39.20; 1.39.22;
merge ktrace-lwp.
 1.38 07-Jul-2005  tron Defopt IPSEC_NAT_T.
 1.37 29-Apr-2005  yamt branches: 1.37.2;
move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.36 23-Apr-2005  manu Enhance IPSEC_NAT_T so that it can work with multiple machines behind the
same NAT.
 1.35 11-Feb-2004  itojun branches: 1.35.8; 1.35.14;
KNF
 1.34 25-Oct-2003  christos fix uninitialized variables
 1.33 06-Aug-2003  itojun m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.
 1.32 22-Jul-2003  itojun unifdef -U_IP_VHL
 1.31 09-Jul-2003  itojun remove obsolete comment on the use of m_pullup
 1.30 04-Jul-2003  itojun fix missing check for taillen against pkthdr.len. markus@openbsd
 1.29 14-May-2003  itojun branches: 1.29.2;
always use PULLDOWN_TEST codepath.
 1.28 20-Jan-2003  simonb Remove variable that is only assigned too but not referenced.
 1.27 28-Oct-2002  itojun increase correct stat. KAME pr 445
 1.26 11-Sep-2002  itojun correct pointer signedness mixups. sync w/kame
 1.25 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.24 21-Aug-2002  itojun check packet length before fetching ESP crypto checksum. sync w/kame
 1.23 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.22 08-Jun-2002  itojun whitespace cleanup
 1.21 18-Mar-2002  itojun branches: 1.21.4; 1.21.6;
esp/ah_ctlinput: pass useful address to key_alloc.
 1.20 18-Dec-2001  itojun remove obsolete #if 0'ed portion.
 1.19 13-Nov-2001  lukem add RCSIDs
 1.18 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.17 13-Apr-2001  thorpej branches: 1.17.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.16 01-Mar-2001  itojun branches: 1.16.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.15 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.14 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.13 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.12 19-Oct-2000  itojun branches: 1.12.2;
memcpy -> bcopy, for sync with kame tree
 1.11 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.10 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.9 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.8 18-Sep-2000  itojun pullup IPv6 and subsequent headers, on IPv6 IPsec transport mode input.
(not normally visited - we have switched to m_pulldown. just for completeness)
 1.7 29-Aug-2000  itojun improve code sharing for esp_schedule(). add some diagnostics cases
for esp_cbc_{en,de}crypt(). sync with kame.
 1.6 29-Aug-2000  itojun use per-block cipher function + esp_cbc_{de,en}crypt. do not use
cbc-over-mbuf functions in sys/crypto.

the change should make it much easier to switch crypto function to
machine-dependent ones (like assembly code under sys/arch/i386/crypto?).
also it should be much easier to import AES algorithms.

XXX: it looks that past blowfish-cbc code was buggy. i ran some test pattern,
and new blowfish-cbc code looks more correct. there's no interoperability
between the old code (before the commit) and the new code (after the commit).

XXX: need serious interop tests before move it into 1.5 branch
 1.5 16-Aug-2000  itojun add missing splx, when outgoing interface queue is full on tunnelled
ESP packet output. KAME PR 280.
 1.4 30-Jul-2000  itojun clarify comment. from jhawk. sync with kame.
 1.3 23-Jul-2000  itojun pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
sync with kame.

loopback, blowfish-cbc transport mode, 128bit key
before: 86588496 bytes received in 00:42 (1.94 MB/s)
after: 86588496 bytes received in 00:31 (2.58 MB/s)
 1.2 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.1 14-Jun-2000  thorpej branches: 1.1.1;
Initial revision
 1.1.1.1 14-Jun-2000  thorpej branches: 1.1.1.1.2; 1.1.1.1.4;
Import IPsec ESP from netbsd-cryptosrc-intl.
 1.1.1.1.4.2 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.1.1.1.4.1 14-Jun-2000  minoura file esp_input.c was added on branch minoura-xpg4dl on 2000-06-22 17:09:54 +0000
 1.1.1.1.2.11 09-Sep-2003  msaitoh Pull up revision 1.33 (requested by itojun in ticket #63):
m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.
 1.1.1.1.2.10 05-Aug-2003  msaitoh Pull up revision 1.30 via patch (requested by itojun in ticket #55):
Fix missing check for taillen against pkthdr.len.
 1.1.1.1.2.9 28-Oct-2002  itojun sys/netinet6/esp_input.c 1.27

Increase correct statistics on ESP packet length check. Fixes KAME PR#445.

(itojun)
 1.1.1.1.2.8 04-Sep-2002  itojun pullup sys/netinet6/esp_input.c 1.24 (itojun)

check packet length before fetching ESP crypto checksum. sync w/kame
 1.1.1.1.2.7 06-Apr-2001  he Pull up revision 1.14 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.1.1.1.2.6 11-Mar-2001  he Pull up revision 1.16 (requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.1.1.1.2.5 02-Oct-2000  itojun pullup (approved by releng-1-5)
correct ipsecstat/ipsec6stat mixup.

netinet6/ah_input.c 1.18 -> 1.19
netinet6/ah_output.c 1.11 -> 1.12 (part of)
netinet6/esp_input.c 1.8 -> 1.9 (part of)
netinet6/esp_output.c 1.8 -> 1.9
netinet6/icmp6.c 1.43 -> 1.44
netinet6/ipcomp_input.c 1.13 -> 1.14
netinet6/ipcomp_output.c 1.13 -> 1.14
 1.1.1.1.2.4 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.1.1.1.2.3 16-Aug-2000  itojun pullup (approved by releng-1-5)
> add missing splx, when outgoing interface queue is full on tunnelled
> IPsec packet output. KAME PR 280.
> cvs rdiff -r1.17 -r1.18 syssrc/sys/netinet6/ah_input.c
> cvs rdiff -r1.4 -r1.5 syssrc/sys/netinet6/esp_input.c
 1.1.1.1.2.2 30-Jul-2000  itojun pullup (approved by releng-1-5)

esp encryption performance improvement, specifically for algorithms
with long key setup time (blowfish). KAME PR 229.

> pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
> sync with kame.

1.11 -> 1.12 syssrc/sys/netinet6/ah.h
1.9 -> 1.10 syssrc/sys/netinet6/esp.h
1.2 -> 1.3 syssrc/sys/netinet6/esp_core.c \
1.2 -> 1.3 syssrc/sys/netinet6/esp_input.c
1.3 -> 1.4 syssrc/sys/netinet6/esp_output.c
1.27 -> 1.28 syssrc/sys/netkey/key.c
1.6 -> 1.7 syssrc/sys/netkey/keydb.h

> clarify comment. from jhawk. sync with kame.

1.3 -> 1.4 syssrc/sys/netinet6/esp_input.c
1.4 -> 1.5 syssrc/sys/netinet6/esp_output.c
 1.1.1.1.2.1 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.12.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.12.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.12.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.12.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.12.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.2.1 19-Oct-2000  bouyer file esp_input.c was added on branch thorpej_scsipi on 2000-11-20 18:10:43 +0000
 1.16.2.9 11-Nov-2002  nathanw Catch up to -current
 1.16.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.16.2.7 27-Aug-2002  nathanw Catch up to -current.
 1.16.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.16.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.16.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.16.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.16.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.17.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.17.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.21.6.4 05-Sep-2003  tron Pull up revision 1.33 (requested by itojun in ticket #1401):
m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.
 1.21.6.3 10-Jul-2003  tron Pull up revision 1.30 (requested by itojun in ticket #1364):
fix missing check for taillen against pkthdr.len. markus@openbsd
 1.21.6.2 10-Dec-2002  jmc Pull up revisions 1.26-1.27 (requested by itojun in ticket #951)
increase correct stat. KAME pr 445
 1.21.6.1 22-Aug-2002  lukem Pull up revision 1.24 (requested by itojun in ticket #713):
check packet length before fetching ESP crypto checksum. sync w/kame
 1.21.4.2 29-Aug-2002  gehenna catch up with -current.
 1.21.4.1 20-Jun-2002  gehenna catch up with -current.
 1.29.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.29.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.29.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.29.2.1 03-Aug-2004  skrll Sync with HEAD
 1.35.14.2 18-Jul-2005  riz Pull up revision 1.38 (requested by tron in ticket #565):
Defopt IPSEC_NAT_T.
 1.35.14.1 28-Apr-2005  tron Pull up revision 1.36 (requested by man in ticket #201):
Enhance IPSEC_NAT_T so that it can work with multiple machines behind
the same NAT.
 1.35.8.1 29-Apr-2005  kent sync with -current
 1.37.2.5 27-Oct-2007  yamt sync with head.
 1.37.2.4 03-Sep-2007  yamt sync with head.
 1.37.2.3 26-Feb-2007  yamt sync with head.
 1.37.2.2 30-Dec-2006  yamt sync with head.
 1.37.2.1 21-Jun-2006  yamt sync with head.
 1.39.22.2 10-Dec-2006  yamt sync with head.
 1.39.22.1 22-Oct-2006  yamt sync with head
 1.39.20.1 18-Nov-2006  ad Sync with head.
 1.41.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.41.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.43.4.1 11-Jul-2007  mjf Sync with head.
 1.43.2.2 23-Oct-2007  ad Sync with head.
 1.43.2.1 08-Jun-2007  ad Sync with head.
 1.44.12.1 25-Oct-2007  bouyer Sync with HEAD.
 1.44.8.1 06-Nov-2007  matt sync with HEAD
 1.44.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.45.18.1 18-May-2008  yamt sync with head.
 1.45.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.47.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.47.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.47.2.1 04-May-2009  yamt sync with head.
 1.50.14.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.50.12.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.50.8.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.50.6.1 05-Apr-2012  mrg sync to latest -current.
 1.50.2.1 17-Apr-2012  yamt sync with head
 1.37 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.36 18-Apr-2009  tsutsui branches: 1.36.12; 1.36.16;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.35 18-Mar-2009  cegger bcopy -> memcpy
 1.34 18-Mar-2009  cegger bzero -> memset
 1.33 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.32 23-Apr-2008  thorpej branches: 1.32.2; 1.32.10; 1.32.16;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.31 09-Dec-2007  degroote branches: 1.31.10; 1.31.12;
Kill _IP_VHL ifdef (from netinet/ip.h history, it has never been used in NetBSD so ...)
 1.30 22-Sep-2007  degroote branches: 1.30.8; 1.30.10;
{ah,esp,ipcomp}_output must return 0 on success. On failure, it returns the
error and m is freed. Previously, it was not the case in ipcomp and esp case
(aka in some case, it returns 0 with m freed, or an error and m was not freed).

In ipcomp_output, fix some leak of mcopy too.

Use the same error path in {ah,esp,ipcomp}_output.

Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.29 21-Sep-2007  degroote In the IPSEC_NAT_T case, we must set the udp length even if the ESP entry
doesn't have an integrity algorithm.

Reported by Wolfgang Stukenbrock in pr/36781 . Thanks you a lot.
 1.28 23-May-2007  christos branches: 1.28.6; 1.28.8;
fix typos in previous
 1.27 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.26 24-Nov-2006  christos branches: 1.26.2; 1.26.8; 1.26.10; 1.26.16;
fix spelling of accommodate; from Zapher.
 1.25 30-Aug-2006  christos branches: 1.25.2; 1.25.4;
comment out comparison always false
 1.24 11-Dec-2005  christos branches: 1.24.4; 1.24.8;
merge ktrace-lwp.
 1.23 07-Jul-2005  tron Defopt IPSEC_NAT_T.
 1.22 29-May-2005  christos branches: 1.22.2;
- avoid shadowed variables
- sprinkle const.
 1.21 23-Apr-2005  manu Enhance IPSEC_NAT_T so that it can work with multiple machines behind the
same NAT.
 1.20 26-Feb-2005  perry branches: 1.20.2;
nuke trailing whitespace
 1.19 12-Feb-2005  manu Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.18 07-Sep-2003  itojun branches: 1.18.8; 1.18.10;
- prepare for RFC2401bis 64bit sequence number (no behavior change yet)
- use hash for SPI-based SAD entry lookup (should be faster, i hope)
- cleanup keydb.c and key.c. key.c is responsible for refcounting secasvar,
keydb.c is responsible for alloc/free.
 1.17 22-Jul-2003  itojun unifdef -U_IP_VHL
 1.16 27-Sep-2002  provos branches: 1.16.6;
remove trailing \n in panic(). approved perry.
 1.15 09-Aug-2002  itojun avoid hardcoded "16" for max AH sum size. use AH_MAXSUMSIZE.
 1.14 09-Aug-2002  itojun use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>
 1.13 09-Jun-2002  itojun whitespace cleanup
 1.12 13-Nov-2001  lukem branches: 1.12.8; 1.12.10;
add RCSIDs
 1.11 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.10 05-Oct-2000  itojun branches: 1.10.2; 1.10.4; 1.10.6;
always use rnd(4) for IPsec random number source. avoid random(9).
if there's no rnd(4), random(9) will be used with one-time warning printf(9).

XXX not sure how good rnd_extract_data(RND_EXTRACT_ANY) is, under entropy-
starvation situation
 1.9 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.8 26-Sep-2000  itojun do not hardcode maximum IV length.
 1.7 29-Aug-2000  itojun improve code sharing for esp_schedule(). add some diagnostics cases
for esp_cbc_{en,de}crypt(). sync with kame.
 1.6 29-Aug-2000  itojun use per-block cipher function + esp_cbc_{de,en}crypt. do not use
cbc-over-mbuf functions in sys/crypto.

the change should make it much easier to switch crypto function to
machine-dependent ones (like assembly code under sys/arch/i386/crypto?).
also it should be much easier to import AES algorithms.

XXX: it looks that past blowfish-cbc code was buggy. i ran some test pattern,
and new blowfish-cbc code looks more correct. there's no interoperability
between the old code (before the commit) and the new code (after the commit).

XXX: need serious interop tests before move it into 1.5 branch
 1.5 30-Jul-2000  itojun clarify comment. from jhawk. sync with kame.
 1.4 23-Jul-2000  itojun pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
sync with kame.

loopback, blowfish-cbc transport mode, 128bit key
before: 86588496 bytes received in 00:42 (1.94 MB/s)
after: 86588496 bytes received in 00:31 (2.58 MB/s)
 1.3 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.2 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.1 14-Jun-2000  thorpej branches: 1.1.1;
Initial revision
 1.1.1.1 14-Jun-2000  thorpej branches: 1.1.1.1.2; 1.1.1.1.4;
Import IPsec ESP from netbsd-cryptosrc-intl.
 1.1.1.1.4.2 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.1.1.1.4.1 14-Jun-2000  minoura file esp_output.c was added on branch minoura-xpg4dl on 2000-06-22 17:09:54 +0000
 1.1.1.1.2.7 04-Sep-2002  itojun sys/netinet6/esp.h 1.20
sys/netinet6/esp_core.c 1.24
sys/netinet6/esp_output.c 1.14
use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>

(itojun)
 1.1.1.1.2.6 05-Oct-2000  itojun pullup (approved by releng-1-5)

always use rnd(4) for IPsec random number source. avoid random(9).
if there's no rnd(4), random(9) will be used with one-time warning printf(9).

XXX not sure how good rnd_extract_data(RND_EXTRACT_ANY) is, under entropy-
starvation situation

cvs rdiff -r1.11 -r1.12 syssrc/sys/netinet6/esp_core.c
cvs rdiff -r1.9 -r1.10 syssrc/sys/netinet6/esp_output.c
cvs rdiff -r1.38 -r1.39 syssrc/sys/netkey/key.c
cvs rdiff -r1.6 -r1.7 syssrc/sys/netkey/key.h
 1.1.1.1.2.5 02-Oct-2000  itojun pullup (approved by releng-1-5)
correct ipsecstat/ipsec6stat mixup.

netinet6/ah_input.c 1.18 -> 1.19
netinet6/ah_output.c 1.11 -> 1.12 (part of)
netinet6/esp_input.c 1.8 -> 1.9 (part of)
netinet6/esp_output.c 1.8 -> 1.9
netinet6/icmp6.c 1.43 -> 1.44
netinet6/ipcomp_input.c 1.13 -> 1.14
netinet6/ipcomp_output.c 1.13 -> 1.14
 1.1.1.1.2.4 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.1.1.1.2.3 30-Jul-2000  itojun pullup (approved by releng-1-5)

esp encryption performance improvement, specifically for algorithms
with long key setup time (blowfish). KAME PR 229.

> pre-compute and cache intermediate crypto key. suggestion from sommerfeld,
> sync with kame.

1.11 -> 1.12 syssrc/sys/netinet6/ah.h
1.9 -> 1.10 syssrc/sys/netinet6/esp.h
1.2 -> 1.3 syssrc/sys/netinet6/esp_core.c \
1.2 -> 1.3 syssrc/sys/netinet6/esp_input.c
1.3 -> 1.4 syssrc/sys/netinet6/esp_output.c
1.27 -> 1.28 syssrc/sys/netkey/key.c
1.6 -> 1.7 syssrc/sys/netkey/keydb.h

> clarify comment. from jhawk. sync with kame.

1.3 -> 1.4 syssrc/sys/netinet6/esp_input.c
1.4 -> 1.5 syssrc/sys/netinet6/esp_output.c
 1.1.1.1.2.2 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.1.1.1.2.1 17-Jul-2000  itojun pullup 1.1 -> 1.2 (approved by releng-1-5)
remove unnecessary #include <netkey/key_debug.h>. from kame.
do not emit packet if esp auth fails.
 1.10.6.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.10.6.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.10.6.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.10.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.4.5 18-Oct-2002  nathanw Catch up to -current.
 1.10.4.4 13-Aug-2002  nathanw Catch up to -current.
 1.10.4.3 20-Jun-2002  nathanw Catch up to -current.
 1.10.4.2 14-Nov-2001  nathanw Catch up to -current.
 1.10.4.1 22-Oct-2001  nathanw Catch up to -current.
 1.10.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.2.1 05-Oct-2000  bouyer file esp_output.c was added on branch thorpej_scsipi on 2000-11-20 18:10:43 +0000
 1.12.10.1 09-Aug-2002  lukem Pull up revision 1.14 (requested by itojun in ticket #659):
use correct padding boundary, to correctly estimate ESP header size.
problem found by Arto Selonen <arto@selonen.org>
 1.12.8.2 29-Aug-2002  gehenna catch up with -current.
 1.12.8.1 20-Jun-2002  gehenna catch up with -current.
 1.16.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.6.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.16.6.4 15-Feb-2005  skrll Sync with HEAD.
 1.16.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.6.1 03-Aug-2004  skrll Sync with HEAD
 1.18.10.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.18.10.1 12-Feb-2005  yamt sync with head.
 1.18.8.1 29-Apr-2005  kent sync with -current
 1.20.2.4 23-Sep-2007  bouyer Pull up following revision(s) (requested by degroote in ticket #1846):
sys/netinet6/ipcomp_output.c: revision 1.22
sys/netinet6/ah_output.c: revision 1.30
sys/netinet6/esp_output.c: revision 1.30
Fix some possible mbuf leak in kame ipsec code.
Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.20.2.3 23-Sep-2007  bouyer Pull up following revision(s) (requested by degroote in ticket #1845):
sys/netinet6/esp_output.c: revision 1.29
In the IPSEC_NAT_T case, we must set the udp length even if the ESP entry
doesn't have an integrity algorithm.
Reported by Wolfgang Stukenbrock in pr/36781 . Thanks you a lot.
 1.20.2.2 18-Jul-2005  riz Pull up revision 1.23 (requested by tron in ticket #565):
Defopt IPSEC_NAT_T.
 1.20.2.1 28-Apr-2005  tron Pull up revision 1.21 (requested by man in ticket #201):
Enhance IPSEC_NAT_T so that it can work with multiple machines behind
the same NAT.
 1.22.2.5 21-Jan-2008  yamt sync with head
 1.22.2.4 27-Oct-2007  yamt sync with head.
 1.22.2.3 03-Sep-2007  yamt sync with head.
 1.22.2.2 30-Dec-2006  yamt sync with head.
 1.22.2.1 21-Jun-2006  yamt sync with head.
 1.24.8.1 03-Sep-2006  yamt sync with head.
 1.24.4.1 09-Sep-2006  rpaulo sync with head
 1.25.4.1 10-Dec-2006  yamt sync with head.
 1.25.2.1 12-Jan-2007  ad Sync with head.
 1.26.16.1 30-Sep-2007  wrstuden Catch up on netbsd-4 as of a few days ago.
 1.26.10.1 11-Jul-2007  mjf Sync with head.
 1.26.8.2 09-Oct-2007  ad Sync with head.
 1.26.8.1 08-Jun-2007  ad Sync with head.
 1.26.2.2 25-Sep-2007  xtraeme Pull up following revision(s) (requested by degroote in ticket #896):
sys/netinet6/ipcomp_output.c: revision 1.22
sys/netinet6/ah_output.c: revision 1.30
sys/netinet6/esp_output.c: revision 1.30

{ah,esp,ipcomp}_output must return 0 on success. On failure, it returns the
error and m is freed. Previously, it was not the case in ipcomp and esp case
(aka in some case, it returns 0 with m freed, or an error and m was not freed).

In ipcomp_output, fix some leak of mcopy too.

Use the same error path in {ah,esp,ipcomp}_output.

Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.26.2.1 25-Sep-2007  xtraeme Pull up following revision(s) (requested by degroote in ticket #893):
sys/netinet6/esp_output.c: revision 1.29

In the IPSEC_NAT_T case, we must set the udp length even if the ESP entry
doesn't have an integrity algorithm.

Reported by Wolfgang Stukenbrock in pr/36781 . Thanks you a lot.
 1.28.8.2 09-Jan-2008  matt sync with HEAD
 1.28.8.1 06-Nov-2007  matt sync with HEAD
 1.28.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.30.10.1 11-Dec-2007  yamt sync with head.
 1.30.8.1 26-Dec-2007  ad Sync with head.
 1.31.12.1 18-May-2008  yamt sync with head.
 1.31.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.32.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.32.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.32.2.1 04-May-2009  yamt sync with head.
 1.36.16.1 05-Apr-2012  mrg sync to latest -current.
 1.36.12.1 17-Apr-2012  yamt sync with head
 1.21 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.20 16-Nov-2006  christos branches: 1.20.88; 1.20.92;
__unused removal on arguments; approved by core.
 1.19 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.18 11-Dec-2005  christos branches: 1.18.20; 1.18.22;
merge ktrace-lwp.
 1.17 27-Aug-2003  itojun branches: 1.17.16;
simplify rijndael.c API - always schedule encrypt/decrypt key.
reviewed by thorpej
 1.16 27-Aug-2003  itojun rijndael encryption context/scheduled key is assymmetric; need to setup two
(one for encryption, one for decryption)
 1.15 26-Aug-2003  thorpej Use the simplified rijndael API (which this was essentially a duplicate
of). XXX This file can now be merged into esp_core.c.
 1.14 25-Jul-2003  itojun add AH/ESP algorithms: hmac-ripemd160 (AH), AES XCBC MAC (AH),
AES counter mode (ESP)
 1.13 20-Jul-2003  itojun change ESP xx_schedlen() return type to size_t. sync w/kame
 1.12 15-Jul-2003  kleink assymetric -> asymmetric
 1.11 15-Jul-2003  itojun rijndael is assymmetric, correction from markus@openbsd
 1.10 15-Jul-2003  itojun simplify and update rijndael code. markus@openbsd
 1.9 08-Jan-2003  itojun branches: 1.9.2;
allocate route_in6 in struct secashead, to avoid mistakenly overrun
the end of secashead. Fixes PR18751.
 1.8 11-Sep-2002  itojun correct pointer signedness mixups. sync w/kame
 1.7 13-Nov-2001  lukem branches: 1.7.10;
add RCSIDs
 1.6 15-Oct-2001  itojun sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.
 1.5 18-Jun-2001  wiz branches: 1.5.2;
Symmetric has one s and two m's.
 1.4 02-Mar-2001  itojun branches: 1.4.2;
pass key to rijndael logic as binary, not hexadecimal string.
sync with kame
 1.3 08-Nov-2000  itojun branches: 1.3.2;
save a little bit of CPU time (avoid computing CBC IV we do not use).
sync with kame.
 1.2 02-Oct-2000  itojun branches: 1.2.2;
remove #ifdef freebsd
 1.1 02-Oct-2000  itojun add ESP rijndael logic. yet to be usable (until algorithm # is assigned)
 1.2.2.2 23-Jan-2003  msaitoh Pull up revision 1.9 (requested by itojun):

allocate route_in6 in struct secashead, to avoid mistakenly overrun
the end of secashead. Fixes PR18751.
 1.2.2.1 02-Oct-2000  msaitoh file esp_rijndael.c was added on branch netbsd-1-5 on 2003-01-23 10:22:29 +0000
 1.3.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.3.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.3.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.2.1 08-Nov-2000  bouyer file esp_rijndael.c was added on branch thorpej_scsipi on 2000-11-20 18:10:44 +0000
 1.4.2.5 08-Jan-2003  thorpej Sync with HEAD.
 1.4.2.4 17-Sep-2002  nathanw Catch up to -current.
 1.4.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.4.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.4.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.5.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.5.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.7.10.1 10-Jan-2003  jmc Pullup revisions 1.8-1.9 (reequested by itojun in ticket #1060)
allocate route_in6 in struct secashead, to avoid mistakenly overrun
the end of secashead. Fixes PR18751.
 1.9.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.2.1 03-Aug-2004  skrll Sync with HEAD
 1.17.16.1 30-Dec-2006  yamt sync with head.
 1.18.22.2 10-Dec-2006  yamt sync with head.
 1.18.22.1 22-Oct-2006  yamt sync with head
 1.18.20.1 18-Nov-2006  ad Sync with head.
 1.20.92.1 05-Apr-2012  mrg sync to latest -current.
 1.20.88.1 17-Apr-2012  yamt sync with head
 1.5 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.4 14-Mar-2009  dsl branches: 1.4.12; 1.4.16;
Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.3 10-Dec-2005  elad branches: 1.3.74; 1.3.84; 1.3.90;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 20-Jul-2003  itojun branches: 1.2.16;
change ESP xx_schedlen() return type to size_t. sync w/kame
 1.1 02-Oct-2000  itojun branches: 1.1.2; 1.1.4; 1.1.28;
add ESP rijndael logic. yet to be usable (until algorithm # is assigned)
 1.1.28.4 11-Dec-2005  christos Sync with head.
 1.1.28.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.28.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.28.1 03-Aug-2004  skrll Sync with HEAD
 1.1.4.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.4.1 02-Oct-2000  bouyer file esp_rijndael.h was added on branch thorpej_scsipi on 2000-11-20 18:10:44 +0000
 1.1.2.2 02-Oct-2000  itojun add ESP rijndael logic. yet to be usable (until algorithm # is assigned)
 1.1.2.1 02-Oct-2000  itojun file esp_rijndael.h was added on branch netbsd-1-5 on 2000-10-02 17:21:27 +0000
 1.2.16.1 21-Jun-2006  yamt sync with head.
 1.3.90.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.3.84.1 28-Apr-2009  skrll Sync with HEAD.
 1.3.74.1 04-May-2009  yamt sync with head.
 1.4.16.1 05-Apr-2012  mrg sync to latest -current.
 1.4.12.1 17-Apr-2012  yamt sync with head
 1.9 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.8 09-Jan-2012  drochner Make FAST_IPSEC the default IPSEC implementation which is built
into the kernel if the "IPSEC" kernel option is given.
The old implementation is still available as KAME_IPSEC.
Do some minimal manpage adjustment -- kame_ipsec(4) is a copy
of the old ipsec(4) and the latter is now a copy of fast_ipsec(4).
 1.7 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.6 19-Nov-2011  tls branches: 1.6.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.5 27-Oct-2006  christos branches: 1.5.88;
Merge kernel and userland rmd160 and sha2 implementation.
XXX: We still install rmd160.h and sha2.h in /usr/include/crypto, unlike
the other hash functions which get installed in /usr/include for compatibility.
 1.4 11-Dec-2005  christos branches: 1.4.20; 1.4.22;
merge ktrace-lwp.
 1.3 07-Jul-2005  tron Defopt IPSEC_NAT_T.
 1.2 20-Sep-2003  itojun branches: 1.2.4; 1.2.16; 1.2.18;
separate netkey/key* and netipsec/key*
 1.1 12-Sep-2003  itojun change confusing filename
 1.2.18.2 30-Dec-2006  yamt sync with head.
 1.2.18.1 21-Jun-2006  yamt sync with head.
 1.2.16.1 18-Jul-2005  riz Pull up revision 1.3 (requested by tron in ticket #565):
Defopt IPSEC_NAT_T.
 1.2.4.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.4.2 03-Aug-2004  skrll Sync with HEAD
 1.2.4.1 20-Sep-2003  skrll file files.ipsec was added on branch ktrace-lwp on 2004-08-03 10:55:11 +0000
 1.4.22.1 10-Dec-2006  yamt sync with head.
 1.4.20.1 18-Nov-2006  ad Sync with head.
 1.5.88.1 17-Apr-2012  yamt sync with head
 1.6.2.2 05-Apr-2012  mrg sync to latest -current.
 1.6.2.1 18-Feb-2012  mrg merge to -current.
 1.14 08-Mar-2021  christos no need for ip6_id.c...
 1.13 08-Feb-2018  maxv branches: 1.13.16;
Move udp6_output() into udp6_usrreq.c, and remove udp6_output.c. This is
more consistent with IPv4, and there is no good reason for keeping a
separate file only for one function. FreeBSD did the same.
 1.12 02-Aug-2016  knakahara ip6flow refactor like ipflow.

- move ip6flow sysctls into ip6_flow.c like ip_flow.c:r1.64
- build ip6_flow.c only if GATEWAY kernel option is enabled
 1.11 13-Oct-2015  rjs branches: 1.11.2;
Add core networking support for SCTP.
 1.10 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.9 02-Dec-2014  christos add routines to print in6_addr and sockaddr_in6 (in6_print, sin6_print)
 1.8 25-Jan-2008  joerg branches: 1.8.2; 1.8.54; 1.8.74;
Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.7 07-Mar-2007  liamjfoy branches: 1.7.16; 1.7.22;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.6 25-Nov-2006  yamt branches: 1.6.4;
move tso-by-software code to their own files. no functional changes.
 1.5 05-May-2006  rpaulo branches: 1.5.8; 1.5.10;
Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.4 21-Jan-2006  rpaulo branches: 1.4.2; 1.4.4; 1.4.6; 1.4.8; 1.4.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.3 11-Dec-2005  christos branches: 1.3.2;
merge ktrace-lwp.
 1.2 06-Sep-2003  itojun branches: 1.2.16;
randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.1 10-Oct-2002  thorpej branches: 1.1.2; 1.1.8;
Move netinet, netinet6, ipsec, and ipfilter config defns to
netinet/files.ipfilter, etinet/files.netinet, netinet6/files.netinet6,
and netinet6/files.netipsec.

XXX There are still a few stragglers in conf/files, which are entangled
with other network protocols.
 1.1.8.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.8.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.8.1 03-Aug-2004  skrll Sync with HEAD
 1.1.2.2 18-Oct-2002  nathanw Catch up to -current.
 1.1.2.1 10-Oct-2002  nathanw file files.netinet6 was added on branch nathanw_sa on 2002-10-18 02:45:21 +0000
 1.2.16.4 04-Feb-2008  yamt sync with head.
 1.2.16.3 03-Sep-2007  yamt sync with head.
 1.2.16.2 30-Dec-2006  yamt sync with head.
 1.2.16.1 21-Jun-2006  yamt sync with head.
 1.3.2.1 01-Feb-2006  yamt sync with head.
 1.4.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.4.8.1 11-May-2006  elad sync with head
 1.4.6.1 24-May-2006  yamt sync with head.
 1.4.4.1 01-Jun-2006  kardel Sync with head.
 1.4.2.1 09-Sep-2006  rpaulo sync with head
 1.5.10.1 10-Dec-2006  yamt sync with head.
 1.5.8.1 12-Jan-2007  ad Sync with head.
 1.6.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.7.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.7.16.1 23-Mar-2008  matt sync with HEAD
 1.8.74.3 05-Oct-2016  skrll Sync with HEAD
 1.8.74.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.8.74.1 06-Apr-2015  skrll Sync with HEAD
 1.8.54.1 03-Dec-2017  jdolecek update from HEAD
 1.8.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.11.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.13.16.1 03-Apr-2021  thorpej Sync with HEAD.
 1.6 12-Sep-2003  itojun change confusing filename
 1.5 25-Jul-2003  itojun add AH/ESP algorithms: hmac-ripemd160 (AH), AES XCBC MAC (AH),
AES counter mode (ESP)
 1.4 22-Jul-2003  itojun sha2 is needed for AH, not ESP
 1.3 22-Jul-2003  itojun add hmac-sha2 support. various cleanups (like avoid hardcoding '16').
from kame
 1.2 12-Oct-2002  thorpej branches: 1.2.2; 1.2.8;
IPSEC_ESP depends on the "des", "blowfish", "cast128", and "rijndael"
attributes.
 1.1 10-Oct-2002  thorpej Move netinet, netinet6, ipsec, and ipfilter config defns to
netinet/files.ipfilter, etinet/files.netinet, netinet6/files.netinet6,
and netinet6/files.netipsec.

XXX There are still a few stragglers in conf/files, which are entangled
with other network protocols.
 1.2.8.1 03-Aug-2004  skrll Sync with HEAD
 1.2.2.2 18-Oct-2002  nathanw Catch up to -current.
 1.2.2.1 12-Oct-2002  nathanw file files.netipsec was added on branch nathanw_sa on 2002-10-18 02:45:21 +0000
 1.78 19-Apr-2024  ozaki-r frag6: fix calculation of fragment length

Because of the miscalculation, 32 bytes fragmented IPv6 packets
have been wrongly dropped.

See https://mail-index.netbsd.org/tech-net/2024/04/14/msg008741.html
for more details.

Patch from Yasuyuki KOZAKAI (with minor tweaks)
 1.77 29-Aug-2023  christos Add a check for FreeBSD-SA-23:06.ipv6, although it is not reproducible for us.
factor out code copied 3 times (and now would have been a 4th)
 1.76 21-Oct-2022  ozaki-r branches: 1.76.2;
frag6: don't use spin mutex for frag6_lock

frag6_lock is held during sending a packet (icmp6_error), so we must
not use a spin mutex because we can acquire sleep locks on sending
a packet.

Also we don't need to use spin mutex for frag6_lock anymore because
frag6_lock is now not used from hardware interrupt context.
 1.75 13-Nov-2019  ozaki-r Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.74 15-May-2018  maxv branches: 1.74.2; 1.74.6;
When reassembling IPv4/IPv6 packets, ensure each fragment has been subject
to the same IPsec processing. That is to say, that all fragments are ESP,
or AH, or AH+ESP, or none.

The reassembly mechanism can be used both on the wire and inside an IPsec
tunnel, so we need to make sure all fragments of a packet were received
on only one side.

Even though I haven't tried, I believe there are configurations where it
would be possible for an attacker to inject an unencrypted fragment into a
legitimate stream of already-decrypted-and-authenticated fragments.

Typically on IPsec gateways with ESP tunnels, where we can encapsulate
fragments (as opposed to the general case, where we fragment encapsulated
data).

Note, for the record: a funnier thing, under IPv4, would be to send a
zero-sized !MFF fragment at the head of the packet, and manage to trigger
an ICMP error; M_DECRYPTED gets lost by the reassembly, and ICMP will reply
with the packet in clear (not encrypted).
 1.73 03-May-2018  maxv Rename m_pkthdr_remove -> m_remove_pkthdr, to match the existing naming
convention, eg m_copy_pkthdr and m_move_pkthdr.
 1.72 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.71 13-Apr-2018  maxv Localify global variables, style, and add two XXXs.
 1.70 13-Apr-2018  maxv Add XXX, using a pool would be better than kmem.
 1.69 13-Apr-2018  maxv Release the lock a little earlier.
 1.68 13-Apr-2018  maxv Add XXX. In fact, it would be better, if all the fragments were offloaded,
to quickly recompute the checksum on the fly, and keep it in the mbuf
header.
 1.67 09-Mar-2018  maxv Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:

m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);

m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.66 07-Feb-2018  maxv branches: 1.66.2;
Rename back to ip6af_mff. It was actually clearer than ip6af_more.
 1.65 30-Jan-2018  maxv Fix a buffer overflow in ip6_get_prevhdr. Doing

mtod(m, char *) + len

is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.

The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.

But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.

However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.

As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.

Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.

Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.

This place is still fragile.
 1.64 25-Jan-2018  maxv Kick zero-sized fragments. We can't allow them to enter; two fragments
could be put at the same offset.
 1.63 25-Jan-2018  maxv Remove outdated comment and fix typo.
 1.62 25-Jan-2018  maxv Several changes:

* Move the structure definitions into frag6.c, they should not be used
elsewhere.

* Rename ip6af_mff -> ip6af_more, and switch it to bool, easier to
understand.

* Remove IP6_REASS_MBUF, no point in keeping this.

* Remove ip6q_arrive and ip6q_nxtp, unused.

* Style.
 1.61 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.60 24-Jan-2017  ozaki-r branches: 1.60.6;
Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.59 11-Jan-2017  ozaki-r branches: 1.59.2;
Get rid of unnecessary header inclusions
 1.58 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.57 09-Nov-2016  ozaki-r Reduce the number of return points of frag6_input

No functional change.
 1.56 05-Sep-2014  matt branches: 1.56.2; 1.56.4;
Don't use new as a variable name.
 1.55 30-Aug-2013  christos branches: 1.55.4; 1.55.6; 1.55.10;
draft-gont-6man-ipv6-atomic-fragment-00 is now RFC 6949 (Loganaden Velvindron
logan at elandsys dot com)
 1.54 27-Sep-2012  christos branches: 1.54.2;
Loganaden Velvindron:

From "http://tools.ietf.org/html/draft-ietf-6man-ipv6-atomic-fragments-00":

A host that receives an IPv6 packet which includes a Fragment
Header with the "Fragment Offset" equal to 0 and the "M" bit equal
to 0 MUST process such packet in isolation from any other packets/
fragments, even if such packets/fragments contain the same set
{IPV6 Source Address, IPv6 Destination Address, Fragment
Identification}. That is, the Fragment Header of "atomic
fragments" should be removed by the receiving host, and the
resulting packet should be processed as a non-fragmented IPv6
datagram. Additionally, any fragments already queued with the
same set {IPV6 Source Address, IPv6 Destination Address, Fragment
Identification} should not be discarded upon receipt of the
"colliding" IPv6 atomic fragment, since IPv6 atomic fragments do
not really interfere with "normal" fragmented traffic.
 1.53 01-Jul-2012  rmind branches: 1.53.2;
Remove the wrapper of frag6_input(), restore the behaviour changed in r1.50.
Fix ip6_reass_packet() wrapper used by NPF. Remove #if 0 code for handling
overlaping fragments - IPv6 desupported them anyway. Convert to kmem(9).
 1.52 31-Dec-2011  christos branches: 1.52.2;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.51 16-Dec-2011  jakllsch Take softnet_lock and kernel lock in frag6_slowtimo and frag6_fasttimo,
similar to how it's done with other protocols.

If we don't do this sending ICMPv6 messages in this path can cause races
in network interface drivers.
 1.50 04-Nov-2011  zoltan branches: 1.50.4;
Change the IPv6 reassembly mechanism to use mutex(9).
Also add ip6_reass_packet() to be used by NPF.
 1.49 03-May-2011  dyoung branches: 1.49.4;
*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.48 22-Jan-2011  mlelstv When deleting a fragment header use the simple copy operation only if it fits
completely into the mbuf.
 1.47 18-Mar-2009  cegger branches: 1.47.4; 1.47.6; 1.47.8;
bzero -> memset
 1.46 21-May-2008  drochner branches: 1.46.6; 1.46.12;
protocol "drain" functions can be called in interrupt context, so
don't acquire softnet_lock
approved by ad
 1.45 24-Apr-2008  ad branches: 1.45.2; 1.45.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.44 15-Apr-2008  thorpej branches: 1.44.2;
Make ip6 and icmp6 stats per-cpu.
 1.43 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.42 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.41 14-Jan-2008  dyoung branches: 1.41.2; 1.41.6;
Use rtcache_lookup() instead of rtcache_lookup() + rtcache_getrt().
 1.40 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.39 01-Nov-2007  dyoung branches: 1.39.2; 1.39.4; 1.39.8;
De-__P(). frag6.c has always defined IN6_IFSTAT_STRICT, so remove
the definition and trim to the defined(IN6_IFSTAT_STRICT) code.
No functional change intended.
 1.38 23-May-2007  christos branches: 1.38.6; 1.38.8; 1.38.12;
Ansify + add a few comments, from Karl Sjödahl
 1.37 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.36 04-Mar-2007  christos branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.35 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.34 26-Jan-2007  dyoung branches: 1.34.2;
bzero -> memset
 1.33 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.32 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.31 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.30 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.29 26-Jan-2006  rpaulo branches: 1.29.18; 1.29.20;
<netinet6/in6_pcb.h> is not needed.
 1.28 24-Dec-2005  perry branches: 1.28.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.27 11-Dec-2005  christos merge ktrace-lwp.
 1.26 06-Sep-2003  itojun branches: 1.26.16;
randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.25 05-Sep-2003  itojun u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.24 14-May-2003  itojun branches: 1.24.2;
always use PULLDOWN_TEST codepath.
 1.23 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.22 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.21 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.20 09-Jun-2002  itojun whitespace cleanup
 1.19 28-May-2002  itojun use arc4random() where possible.
XXX is it necessary to do microtime() on tcp syn cache?
 1.18 28-May-2002  itojun limit number of IPv6 fragments (not the fragment queue size) to
fight against lots-of-frags DoS attacks. sync w/kame
 1.17 15-Mar-2002  itojun branches: 1.17.4;
have a real lock around IPv6 reassembly.
 1.16 13-Nov-2001  lukem add RCSIDs
 1.15 18-Oct-2001  itojun reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.14 17-May-2001  itojun branches: 1.14.2;
plug memory leak on invalid fragment packet. supress noisy log. from kame
 1.13 22-Feb-2001  itojun branches: 1.13.2;
correct handling of upper limitation to # of reass queue.
 1.12 11-Feb-2001  itojun set frag6_doing_reass properly (for frag6_drain). sync with kame.
 1.11 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.10 06-Feb-2000  itojun branches: 1.10.4;
fix include pathname for better rfc2292 compliance.
 1.9 03-Feb-2000  itojun - Don't reuse ip6 header portion as reassembly pointer, to be friendly
with LP64 arch. (not tested on LP64, sorry)
- add comment on reass rule
- some other cleanups

NetBSD PR: 9340
From: iwamoto@sat.t.u-tokyo.ac.jp
(in sync with kame)
 1.8 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.7 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.6 26-Aug-1999  itojun branches: 1.6.2; 1.6.8;
fix IPv6 fragment ID initialization - random() does not return
random value when frag6_init() is called, so use microtime() to stir
the value better.
 1.5 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file frag6.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file frag6.c was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.6.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.6.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.6.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.2 11-Feb-2003  msaitoh Pull up revision 1.13 (requested by itojun):
correct handling of upper limitation to # of reass queue.
 1.10.4.1 26-May-2001  he Pull up revision 1.14 (requested by itojun):
Plug memory leak in IPv6 fragment reassembly, and omit noisy
logging.
 1.13.2.7 11-Nov-2002  nathanw Catch up to -current
 1.13.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.13.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.13.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.13.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.13.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.13.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.14.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.14.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.14.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.4.2 20-Jun-2002  gehenna catch up with -current.
 1.17.4.1 30-May-2002  gehenna Catch up with -current.
 1.24.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.24.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.24.2.1 03-Aug-2004  skrll Sync with HEAD
 1.26.16.7 17-Mar-2008  yamt sync with head.
 1.26.16.6 21-Jan-2008  yamt sync with head
 1.26.16.5 15-Nov-2007  yamt sync with head.
 1.26.16.4 03-Sep-2007  yamt sync with head.
 1.26.16.3 26-Feb-2007  yamt sync with head.
 1.26.16.2 30-Dec-2006  yamt sync with head.
 1.26.16.1 21-Jun-2006  yamt sync with head.
 1.28.2.1 01-Feb-2006  yamt sync with head.
 1.29.20.3 18-Dec-2006  yamt sync with head.
 1.29.20.2 10-Dec-2006  yamt sync with head.
 1.29.20.1 22-Oct-2006  yamt sync with head
 1.29.18.3 01-Feb-2007  ad Sync with head.
 1.29.18.2 12-Jan-2007  ad Sync with head.
 1.29.18.1 18-Nov-2006  ad Sync with head.
 1.34.2.3 07-May-2007  yamt sync with head.
 1.34.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.34.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.36.4.1 11-Jul-2007  mjf Sync with head.
 1.36.2.1 08-Jun-2007  ad Sync with head.
 1.38.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.38.8.3 23-Mar-2008  matt sync with HEAD
 1.38.8.2 09-Jan-2008  matt sync with HEAD
 1.38.8.1 06-Nov-2007  matt sync with HEAD
 1.38.6.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.39.8.2 19-Jan-2008  bouyer Sync with HEAD
 1.39.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.39.4.1 26-Dec-2007  ad Sync with head.
 1.39.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.41.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.41.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.41.2.1 24-Mar-2008  keiichi sync with head.
 1.44.2.2 04-Jun-2008  yamt sync with head
 1.44.2.1 18-May-2008  yamt sync with head.
 1.45.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.45.2.1 04-May-2009  yamt sync with head.
 1.46.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.46.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.47.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.47.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.47.4.2 31-May-2011  rmind sync with head
 1.47.4.1 05-Mar-2011  rmind sync with head
 1.49.4.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.49.4.3 30-Oct-2012  yamt sync with head
 1.49.4.2 17-Apr-2012  yamt sync with head
 1.49.4.1 10-Nov-2011  yamt sync with head
 1.50.4.1 18-Feb-2012  mrg merge to -current.
 1.52.2.3 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.52.2.2 25-Oct-2012  riz branches: 1.52.2.2.2;
Pull up following revision(s) (requested by christos in ticket #637):
sys/netinet6/frag6.c: revision 1.54
Loganaden Velvindron:
From "http://tools.ietf.org/html/draft-ietf-6man-ipv6-atomic-fragments-00":
A host that receives an IPv6 packet which includes a Fragment
Header with the "Fragment Offset" equal to 0 and the "M" bit equal
to 0 MUST process such packet in isolation from any other packets/
fragments, even if such packets/fragments contain the same set
{IPV6 Source Address, IPv6 Destination Address, Fragment
Identification}. That is, the Fragment Header of "atomic
fragments" should be removed by the receiving host, and the
resulting packet should be processed as a non-fragmented IPv6
datagram. Additionally, any fragments already queued with the
same set {IPV6 Source Address, IPv6 Destination Address, Fragment
Identification} should not be discarded upon receipt of the
"colliding" IPv6 atomic fragment, since IPv6 atomic fragments do
not really interfere with "normal" fragmented traffic.
 1.52.2.1 05-Jul-2012  riz branches: 1.52.2.1.4;
Pull up following revision(s) (requested by rmind in ticket #398):
sys/netinet6/frag6.c: revision 1.53
Remove the wrapper of frag6_input(), restore the behaviour changed in r1.50.
Fix ip6_reass_packet() wrapper used by NPF. Remove #if 0 code for handling
overlaping fragments - IPv6 desupported them anyway. Convert to kmem(9).
 1.52.2.2.2.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.52.2.1.4.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.52.2.1.4.1 25-Oct-2012  riz Pull up following revision(s) (requested by christos in ticket #637):
sys/netinet6/frag6.c: revision 1.54
Loganaden Velvindron:
From "http://tools.ietf.org/html/draft-ietf-6man-ipv6-atomic-fragments-00":
A host that receives an IPv6 packet which includes a Fragment
Header with the "Fragment Offset" equal to 0 and the "M" bit equal
to 0 MUST process such packet in isolation from any other packets/
fragments, even if such packets/fragments contain the same set
{IPV6 Source Address, IPv6 Destination Address, Fragment
Identification}. That is, the Fragment Header of "atomic
fragments" should be removed by the receiving host, and the
resulting packet should be processed as a non-fragmented IPv6
datagram. Additionally, any fragments already queued with the
same set {IPV6 Source Address, IPv6 Destination Address, Fragment
Identification} should not be discarded upon receipt of the
"colliding" IPv6 atomic fragment, since IPv6 atomic fragments do
not really interfere with "normal" fragmented traffic.
 1.53.2.3 03-Dec-2017  jdolecek update from HEAD
 1.53.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.53.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.54.2.1 18-May-2014  rmind sync with head
 1.55.10.3 14-Aug-2018  martin Pull up following revision(s) (requested by maxv in ticket #1630):

sys/netinet6/frag6.c: revision 1.64

Kick zero-sized fragments. We can't allow them to enter; two fragments
could be put at the same offset.
 1.55.10.2 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.55.10.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.55.6.3 14-Aug-2018  martin Pull up following revision(s) (requested by maxv in ticket #1630):

sys/netinet6/frag6.c: revision 1.64

Kick zero-sized fragments. We can't allow them to enter; two fragments
could be put at the same offset.
 1.55.6.2 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.55.6.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.55.4.3 14-Aug-2018  martin Pull up following revision(s) (requested by maxv in ticket #1630):

sys/netinet6/frag6.c: revision 1.64

Kick zero-sized fragments. We can't allow them to enter; two fragments
could be put at the same offset.
 1.55.4.2 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.55.4.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.56.4.2 20-Mar-2017  pgoyette Sync with HEAD
 1.56.4.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.56.2.2 05-Feb-2017  skrll Sync with HEAD
 1.56.2.1 05-Dec-2016  skrll Sync with HEAD
 1.59.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.60.6.7 28-Apr-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1960):

sys/netinet6/frag6.c: revision 1.78

frag6: fix calculation of fragment length

Because of the miscalculation, 32 bytes fragmented IPv6 packets
have been wrongly dropped.

See https://mail-index.netbsd.org/tech-net/2024/04/14/msg008741.html
for more details.

Patch from Yasuyuki KOZAKAI (with minor tweaks)
 1.60.6.6 27-Oct-2022  martin Pull up following revision(s) (requested by ozaki-r in ticket #1778):

sys/netinet6/frag6.c: revision 1.76

frag6: don't use spin mutex for frag6_lock

frag6_lock is held during sending a packet (icmp6_error), so we must
not use a spin mutex because we can acquire sleep locks on sending
a packet.

Also we don't need to use spin mutex for frag6_lock anymore because
frag6_lock is now not used from hardware interrupt context.
 1.60.6.5 27-Sep-2018  martin Pull up following revision(s) (requested by maxv in ticket #1041):

sys/netinet/ip_reass.c: revision 1.17 (patch)
sys/netinet6/frag6.c: revision 1.74 (patch)

When reassembling IPv4/IPv6 packets, ensure each fragment has been subject
to the same IPsec processing. That is to say, that all fragments are ESP,
or AH, or AH+ESP, or none.

The reassembly mechanism can be used both on the wire and inside an IPsec
tunnel, so we need to make sure all fragments of a packet were received
on only one side.

Even though I haven't tried, I believe there are configurations where it
would be possible for an attacker to inject an unencrypted fragment into a
legitimate stream of already-decrypted-and-authenticated fragments.

Typically on IPsec gateways with ESP tunnels, where we can encapsulate
fragments (as opposed to the general case, where we fragment encapsulated
data).

Note, for the record: a funnier thing, under IPv4, would be to send a
zero-sized !MFF fragment at the head of the packet, and manage to trigger
an ICMP error; M_DECRYPTED gets lost by the reassembly, and ICMP will reply
with the packet in clear (not encrypted).
 1.60.6.4 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #695):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.60.6.3 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #663):

sys/netinet6/frag6.c: revision 1.64

Kick zero-sized fragments. We can't allow them to enter; two fragments
could be put at the same offset.
 1.60.6.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #527):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.60.6.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.66.2.4 21-May-2018  pgoyette Sync with HEAD
 1.66.2.3 02-May-2018  pgoyette Synch with HEAD
 1.66.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.66.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.74.6.2 28-Apr-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1836):

sys/netinet6/frag6.c: revision 1.78

frag6: fix calculation of fragment length

Because of the miscalculation, 32 bytes fragmented IPv6 packets
have been wrongly dropped.

See https://mail-index.netbsd.org/tech-net/2024/04/14/msg008741.html
for more details.

Patch from Yasuyuki KOZAKAI (with minor tweaks)
 1.74.6.1 27-Oct-2022  martin Pull up following revision(s) (requested by ozaki-r in ticket #1548):

sys/netinet6/frag6.c: revision 1.76

frag6: don't use spin mutex for frag6_lock

frag6_lock is held during sending a packet (icmp6_error), so we must
not use a spin mutex because we can acquire sleep locks on sending
a packet.

Also we don't need to use spin mutex for frag6_lock anymore because
frag6_lock is now not used from hardware interrupt context.
 1.74.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.76.2.2 13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #865):

sys/netinet6/frag6.c: revision 1.77

Add a check for FreeBSD-SA-23:06.ipv6, although it is not reproducible for us.
factor out code copied 3 times (and now would have been a 4th)
 1.76.2.1 28-Apr-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #673):

sys/netinet6/frag6.c: revision 1.78

frag6: fix calculation of fragment length

Because of the miscalculation, 32 bytes fragmented IPv6 packets
have been wrongly dropped.

See https://mail-index.netbsd.org/tech-net/2024/04/14/msg008741.html
for more details.

Patch from Yasuyuki KOZAKAI (with minor tweaks)
 1.258 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.257 29-Jun-2024  riastradh branches: 1.257.2;
netinet6: Use _NET_STAT* API instead of direct array access.

XXX Exception: ip6flow_addstats_rt _assigns_ one of the `statistics'
to the current count of ip6 flows in use, and we don't have anything
in the _NET_STAT* API for that. So for now I abuse the abstraction,
until we sort out this one exceptional case properly.

PR kern/58380
 1.256 24-Feb-2024  mlelstv Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.255 09-Dec-2023  pgoyette Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

XXX pullup-10 - hopefully before RC2
 1.254 28-Oct-2022  ozaki-r branches: 1.254.2;
inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.253 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.252 29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.251 22-Aug-2022  knakahara Add sysctl entry to enable/disable to use path MTU discovery for icmpv6 reflecting.

If we want to use path MTU discovery for icmp reflecting set
net.inet6.icmp6.reflect_pmtu=1. Default(=0) is the same as before, that is,
use IPV6_MINMTU.
 1.250 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.249 15-Feb-2021  martin Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.
 1.248 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.247 11-Sep-2020  roy branches: 1.247.2;
inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.
 1.246 27-Jul-2020  roy icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.
 1.245 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.244 09-Mar-2020  roy route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.243 06-Oct-2019  uwe icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.
 1.242 22-Dec-2018  maxv branches: 1.242.4;
Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.
 1.241 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.240 25-Oct-2018  ozaki-r Remove a leftover debug printf

Pointed out by hannken@
 1.239 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.238 01-Jun-2018  ozaki-r branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).
 1.237 07-May-2018  maxv Remove misleading comments.
 1.236 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.235 29-Apr-2018  maxv Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.
 1.234 28-Apr-2018  maxv Remove unused ipsec_var.h includes.
 1.233 27-Apr-2018  maxv Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.
 1.232 26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.231 26-Apr-2018  maxv Use M_UNWRITABLE, no functional change.
 1.230 14-Apr-2018  maxv Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.
 1.229 14-Apr-2018  maxv Cosmetic, and remove one XXX (no problem).
 1.228 14-Apr-2018  maxv Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.
 1.227 14-Apr-2018  maxv Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).
 1.226 12-Apr-2018  maxv Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.
 1.225 12-Apr-2018  maxv Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.
 1.224 21-Mar-2018  roy Sprinkle more soroverflow().
 1.223 28-Feb-2018  maxv branches: 1.223.2;
Remove unused ipsec_private.h includes.
 1.222 26-Feb-2018  maxv Remove redundant condition (harmless). PR/53030.
 1.221 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.220 12-Feb-2018  maxv Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.
 1.219 23-Jan-2018  maxv Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.
 1.218 23-Jan-2018  maxv Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.
 1.217 23-Jan-2018  maxv Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.
 1.216 23-Jan-2018  maxv Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.
 1.215 23-Jan-2018  maxv Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.
 1.214 05-Nov-2017  ozaki-r Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659
 1.213 02-Aug-2017  ozaki-r Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
 1.212 07-Jul-2017  knakahara fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.
 1.211 14-Mar-2017  ozaki-r branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT
 1.210 17-Feb-2017  ozaki-r Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.
 1.209 13-Feb-2017  ozaki-r Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.
 1.208 07-Feb-2017  ozaki-r Add missing NULL checks for m_get_rcvif
 1.207 02-Feb-2017  ozaki-r Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net
 1.206 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.205 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.204 13-Jan-2017  ozaki-r branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp
 1.203 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.202 11-Dec-2016  ozaki-r Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)
 1.201 15-Nov-2016  mlelstv Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.
 1.200 31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.199 25-Oct-2016  ozaki-r Remove unnecessary argument

No functional change.
 1.198 18-Oct-2016  ozaki-r Remove unnecessary pserialize_read_enter
 1.197 26-Aug-2016  dholland PR 51434 David Binderman: remove redundant test.
 1.196 19-Aug-2016  roy Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.
 1.195 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.194 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.193 15-Jul-2016  ozaki-r Use ifatoia6 macro

No functional change.
 1.192 07-Jul-2016  ozaki-r branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.191 05-Jul-2016  ozaki-r Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.
 1.190 28-Jun-2016  ozaki-r Add missing NULL checks for m_get_rcvif_psref
 1.189 21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.188 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.187 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.186 18-May-2016  ozaki-r Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.
 1.185 17-May-2016  ozaki-r Get rcvif once and reuse it

No functional change.
 1.184 17-May-2016  ozaki-r Make sure icmp6_redirect_input frees mbuf before return
 1.183 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.182 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.181 01-Apr-2016  ozaki-r Remove unnecessary casts and do s/0/NULL/ for rtrequest
 1.180 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.179 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.178 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.177 14-Sep-2015  ozaki-r Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.
 1.176 31-Aug-2015  ozaki-r Make rt_refcnt take into account rt_timer
 1.175 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.174 24-Aug-2015  ozaki-r Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)
 1.173 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.172 24-Jul-2015  ozaki-r Fix rtfree-ing wrong rtentry
 1.171 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.170 25-Nov-2014  christos branches: 1.170.2;
CID 977389: Out of bounds access.
 1.169 06-Jun-2014  rmind branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.168 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.167 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.166 18-May-2014  rmind Use IFNET_FIRST() rather than open coding ifnet access.
 1.165 25-Feb-2014  pooka branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.164 20-Feb-2014  joerg Bail out in case m_pulldown failed.
 1.163 23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.162 05-Jun-2013  christos branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.161 23-Jun-2012  christos branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.160 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.159 31-Dec-2011  christos branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.158 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.157 31-Aug-2011  plunky branches: 1.157.2; 1.157.6;
NULL does not need a cast
 1.156 12-Sep-2010  drochner avoid NULL dereference in error case
 1.155 18-Oct-2009  christos branches: 1.155.2; 1.155.4;
fix the sun2 case for real.
 1.154 12-Oct-2009  christos unbreak sun2.
 1.153 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.152 18-Mar-2009  cegger bzero -> memset
 1.151 18-Mar-2009  cegger bcmp -> memcmp
 1.150 03-Oct-2008  adrianp branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'
 1.149 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.148 07-May-2008  bouyer branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.
 1.147 04-May-2008  thorpej Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.146 23-Apr-2008  thorpej branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().
 1.145 15-Apr-2008  thorpej branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.
 1.144 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.143 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.142 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.141 04-Dec-2007  dyoung branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.140 01-Nov-2007  dyoung branches: 1.140.2; 1.140.4;
De-__P().
 1.139 29-Oct-2007  dyoung The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.
 1.138 24-Oct-2007  dyoung Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.137 19-Sep-2007  dyoung branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.136 10-Aug-2007  dyoung branches: 1.136.2;
Constify. bcopy -> memcpy.
 1.135 19-Jul-2007  dyoung branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.134 13-Jun-2007  dyoung branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.
 1.133 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.132 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.131 04-Mar-2007  christos branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.130 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.129 10-Feb-2007  degroote branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.128 29-Jan-2007  dyoung bzero -> memset
 1.127 15-Jan-2007  dyoung Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.
 1.126 15-Jan-2007  degroote Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo
 1.125 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.124 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.123 16-Nov-2006  christos branches: 1.123.2;
__unused removal on arguments; approved by core.
 1.122 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.121 05-Sep-2006  dyoung branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.
 1.120 01-Sep-2006  dyoung Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.
 1.119 30-Aug-2006  christos declare the type of code.
 1.118 11-Jul-2006  tron Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.
 1.117 07-Jun-2006  kardel branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.116 15-Apr-2006  christos branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.
 1.115 05-Mar-2006  rpaulo branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.114 03-Mar-2006  rpaulo branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.113 21-Jan-2006  rpaulo branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.112 11-Dec-2005  christos branches: 1.112.2;
merge ktrace-lwp.
 1.111 19-Oct-2005  bouyer In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.
 1.110 18-Aug-2005  yamt branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.109 29-May-2005  christos branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.
 1.108 17-Jan-2005  itojun branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.
 1.107 25-May-2004  atatat branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)
 1.106 26-Mar-2004  itojun branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi
 1.105 24-Mar-2004  atatat Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.104 17-Dec-2003  lha Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat
 1.103 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.102 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.101 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.100 25-Aug-2003  itojun deref member in in6p directly, don't rely on existence of macro
 1.99 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.98 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.97 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.96 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.95 06-Aug-2003  itojun m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.
 1.94 24-Jun-2003  itojun branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame
 1.93 24-Jun-2003  itojun use time.tv_sec directly
 1.92 06-Jun-2003  itojun - sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).
 1.91 03-Jun-2003  itojun remove assumption on redirect header option processing. from kame
 1.90 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.89 31-Mar-2003  itojun avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame
 1.88 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.87 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.86 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.85 30-Jul-2002  itojun no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>
 1.84 10-Jul-2002  itojun correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame
 1.83 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.82 09-Jun-2002  itojun whitespace cleanup
 1.81 08-Jun-2002  itojun whitespace cleanup
 1.80 31-May-2002  itojun do not mistakenly lock PMTUD route entry with RTV_MTU.
 1.79 29-May-2002  christos make this compile again.
 1.78 29-May-2002  itojun correct rmx_mtu value after PMTUD entry timeout (should be set to 0)
 1.77 24-May-2002  itojun extra blank line
 1.76 24-May-2002  itojun make a strict check before sending FQDN node information reply. sync w/kame
 1.75 05-Mar-2002  itojun branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.
 1.74 21-Dec-2001  itojun whitespace/costmetic sync w/kame
 1.73 20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.72 07-Dec-2001  itojun correct timing to increment icmp6 MIB variables. sync with kame
 1.71 13-Nov-2001  lukem add RCSIDs
 1.70 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.69 24-Oct-2001  itojun more whitespace sync with kame
 1.68 18-Oct-2001  itojun branches: 1.68.2;
simplify per-if stats.
 1.67 15-Oct-2001  itojun sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.
 1.66 22-Jun-2001  itojun branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame
 1.65 01-Jun-2001  itojun use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame
 1.64 08-May-2001  itojun correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.63 04-Apr-2001  itojun make sure rcvif is sane on call to icmp6_reflect
 1.62 30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.61 21-Mar-2001  itojun set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame
 1.60 08-Mar-2001  itojun remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.
 1.59 01-Mar-2001  itojun branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.58 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.57 11-Feb-2001  itojun recover $NetBSD$ (removed by mistake)
 1.56 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.55 08-Feb-2001  itojun implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.
 1.54 07-Feb-2001  itojun remove bogus DIAGNOSTIC. sync with kame
 1.53 07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.52 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.51 16-Jan-2001  itojun s/ND6DEBUG/ND6_DEBUG/ to meet other places
 1.50 08-Jan-2001  itojun wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.
 1.49 11-Dec-2000  itojun no need to rtalloc1() twice in pmtud. from kame
 1.48 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.47 11-Nov-2000  itojun improve spec conformance of node information query (07).
sync with kame.
 1.46 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.45 10-Oct-2000  itojun sync with kame ($KAME$)
 1.44 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.43 16-Sep-2000  itojun kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.
 1.42 19-Aug-2000  itojun - icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)
 1.41 03-Aug-2000  itojun clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.
 1.40 03-Aug-2000  itojun correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.
 1.39 30-Jul-2000  itojun sync comment with reality
 1.38 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.37 09-Jul-2000  itojun add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?
 1.36 07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.35 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.34 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.33 13-Jun-2000  itojun branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.
 1.32 13-Jun-2000  itojun workaround to suppress warning on char == unsigned char arch.
 1.31 12-Jun-2000  itojun better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.
 1.30 22-May-2000  itojun branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).
 1.29 09-May-2000  itojun do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).
 1.28 13-Apr-2000  itojun do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)
 1.27 22-Mar-2000  itojun use ip6_{last,next}hdr in icmp6 inbound packet parsing.
 1.26 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.25 28-Feb-2000  itojun fix ICMPv6 redirect input. the bug can result in invalid ND entry.
 1.24 28-Feb-2000  itojun support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.
 1.23 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.22 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.21 15-Feb-2000  thorpej Fix a couple of brainos in the last.
 1.20 14-Feb-2000  thorpej Use ratecheck() for ICMP6 rate limiting.
 1.19 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.18 16-Jan-2000  itojun add missing ipcomp cases.
 1.17 07-Jan-2000  itohy Rename variable "prep" for PReP port.
 1.16 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.15 05-Jan-2000  itojun avoid panic on getsockopt(ICMPV6_FILTER).
 1.14 02-Jan-2000  itojun add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)
 1.13 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.12 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.11 01-Oct-1999  itojun branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?
 1.10 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.9 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.8 22-Jul-1999  itojun - implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.
 1.7 22-Jul-1999  itojun change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.6 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.5 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.4 06-Jul-1999  itojun checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file icmp6.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file icmp6.c was added on branch chs-ubc2 on 1999-07-01 23:48:26 +0000
 1.11.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.2.8 21-Apr-2001  bouyer Sync with HEAD
 1.11.2.7 27-Mar-2001  bouyer Sync with HEAD.
 1.11.2.6 12-Mar-2001  bouyer Sync with HEAD.
 1.11.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.11.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.11.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.11.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.33.2.15 07-Jun-2001  he Pull up revision 1.65 (requested by itojun):
Correct icmp6 hoplimit value.
 1.33.2.14 09-May-2001  he Pull up revision 1.53 (via patch, requested by itojun):
Suppress ND6 logs that are too noisy for normal use. Can be
re-enabled by net.inet6.icmp6.nd6_debug.
 1.33.2.13 09-May-2001  he Pull up revision 1.64 (via patch, requested by itojun):
Correct faith prefix determintaion.
 1.33.2.12 28-Apr-2001  he Pull up revision 1.55 (partial, via patch, requested by itojun):
Correct source address selection in icmp6_reflect().
Fixes two problems: kernel may fail to send icmp6 messages, and
there would be a way for user programs to cause a panic.
 1.33.2.11 22-Apr-2001  he Apply patch (requested by itojun):
Avoid passing NULL pointer to in6_ifawithscope.
 1.33.2.10 06-Apr-2001  he Pull up revision 1.52 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.33.2.9 04-Apr-2001  he Pull up revision 1.63 (requested by itojun):
Make sure rcvif is sane on call to icmp6_reflect().
Fixes panic in certain configurations / instances.
 1.33.2.8 11-Mar-2001  he Pull up revision 1.59 (requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.33.2.7 03-Feb-2001  he Pull up revision 1.51 (requested by itojun):
Correct ND6DEBUG -> ND6_DEBUG.
 1.33.2.6 26-Jan-2001  jhawk Pull up revision 1.50 (requested by itojun):
Only printf() IPv6 ICMP checksum errors under ND6DEBUG.
 1.33.2.5 02-Oct-2000  itojun pullup (approved by releng-1-5)
correct ipsecstat/ipsec6stat mixup.

netinet6/ah_input.c 1.18 -> 1.19
netinet6/ah_output.c 1.11 -> 1.12 (part of)
netinet6/esp_input.c 1.8 -> 1.9 (part of)
netinet6/esp_output.c 1.8 -> 1.9
netinet6/icmp6.c 1.43 -> 1.44
netinet6/ipcomp_input.c 1.13 -> 1.14
netinet6/ipcomp_output.c 1.13 -> 1.14
 1.33.2.4 19-Sep-2000  itojun pullup 1.42 -> 1.43 (approved by releng-1-5)

> kame sys/netinet6/icmp6.c 1.140 -> 1.144
> > in the check for the incoming redirect message, examine the gateway
> > (from the routing table) only when the address family of the gateway is
> > AF_INET6.
 1.33.2.3 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.33.2.2 04-Aug-2000  itojun pullup (approved by releng-1-5)
sys/netinet6/icmp6.h 1.11 -> 1.13
sys/netinet6/icmp6.c 1.39 -> 1.41

cvs rdiff -r1.11 -r1.12 syssrc/sys/netinet/icmp6.h
cvs rdiff -r1.39 -r1.40 syssrc/sys/netinet6/icmp6.c

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.

cvs rdiff -r1.12 -r1.13 syssrc/sys/netinet/icmp6.h
cvs rdiff -r1.40 -r1.41 syssrc/sys/netinet6/icmp6.c

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.
 1.33.2.1 20-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
- add protection mechanism against ND cache corruption due to bad NUD hints.

this is part of:
sys/netinet/icmp6.h 1.9 -> 1.10
sys/netinet/tcp_input.c 1.111 -> 1.112
sys/netinet6/icmp6.c 1.34 -> 1.35
sys/netinet6/nd6.c 1.30 -> 1.31
sys/netinet6/nd6.h 1.14 -> 1.15
 1.59.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.59.2.10 17-Sep-2002  nathanw Catch up to -current.
 1.59.2.9 01-Aug-2002  nathanw Catch up to -current.
 1.59.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.59.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.59.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.59.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.59.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.59.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.59.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.59.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.66.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.66.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.66.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.66.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.66.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.68.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.75.8.3 11-Jul-2002  thorpej pullup-1-6 ticket #457 (itojun).

Original log message:
correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame
 1.75.8.2 05-Jun-2002  lukem Pull up revisions 1.78 & 1.80 (via patch) (requested by itojun in #123 & #124):
- correct rmx_mtu value after PMTUD entry timeout (should be set to 0)
- do not mistakenly lock PMTUD route entry with RTV_MTU.
 1.75.8.1 28-May-2002  tv Pull up revision 1.76 (requested by itojun):
make a strict check before sending FQDN node information reply. sync w/kame
 1.75.6.4 29-Aug-2002  gehenna catch up with -current.
 1.75.6.3 15-Jul-2002  gehenna catch up with -current.
 1.75.6.2 20-Jun-2002  gehenna catch up with -current.
 1.75.6.1 30-May-2002  gehenna Catch up with -current.
 1.94.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.94.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.94.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.94.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.94.2.1 03-Aug-2004  skrll Sync with HEAD
 1.106.2.2 28-Oct-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5938):
sys/netinet6/icmp6.c: revision 1.111
In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.
 1.106.2.1 28-May-2004  tron branches: 1.106.2.1.2; 1.106.2.1.4;
Pull up revision 1.107 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.106.2.1.4.1 28-Oct-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5938):
sys/netinet6/icmp6.c: revision 1.111
In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.
 1.106.2.1.2.1 28-Oct-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5938):
sys/netinet6/icmp6.c: revision 1.111
In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.
 1.107.4.1 29-Apr-2005  kent sync with -current
 1.108.10.1 03-Oct-2008  jdc Pull up revision 1.150 (requested by adrianp in ticket #1966).

Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'
 1.108.8.1 03-Oct-2008  jdc Pull up revision 1.150 (requested by adrianp in ticket #1966).

Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'
 1.108.6.1 03-Oct-2008  jdc Pull up revision 1.150 (requested by adrianp in ticket #1966).

Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'
 1.109.2.8 17-Mar-2008  yamt sync with head.
 1.109.2.7 07-Dec-2007  yamt sync with head
 1.109.2.6 15-Nov-2007  yamt sync with head.
 1.109.2.5 27-Oct-2007  yamt sync with head.
 1.109.2.4 03-Sep-2007  yamt sync with head.
 1.109.2.3 26-Feb-2007  yamt sync with head.
 1.109.2.2 30-Dec-2006  yamt sync with head.
 1.109.2.1 21-Jun-2006  yamt sync with head.
 1.110.2.1 26-Oct-2005  yamt sync with head
 1.112.2.1 01-Feb-2006  yamt sync with head.
 1.113.4.2 22-Apr-2006  simonb Sync with head.
 1.113.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.113.2.3 09-Sep-2006  rpaulo sync with head
 1.113.2.2 23-Feb-2006  rpaulo Another round of s/in6pcb/inpcb/.
 1.113.2.1 02-Feb-2006  rpaulo Adapt to in6pcb -> inpcb changes.
 1.114.2.6 14-Sep-2006  yamt sync with head.
 1.114.2.5 03-Sep-2006  yamt sync with head.
 1.114.2.4 11-Aug-2006  yamt sync with head
 1.114.2.3 26-Jun-2006  yamt sync with head.
 1.114.2.2 24-May-2006  yamt sync with head.
 1.114.2.1 13-Mar-2006  yamt sync with head.
 1.115.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.115.2.1 19-Apr-2006  elad sync with head.
 1.116.2.1 19-Jun-2006  chap Sync with head.
 1.117.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.121.4.3 18-Dec-2006  yamt sync with head.
 1.121.4.2 10-Dec-2006  yamt sync with head.
 1.121.4.1 22-Oct-2006  yamt sync with head
 1.121.2.3 01-Feb-2007  ad Sync with head.
 1.121.2.2 12-Jan-2007  ad Sync with head.
 1.121.2.1 18-Nov-2006  ad Sync with head.
 1.123.2.3 03-Oct-2008  jdc Pull up revision 1.150 (requested by adrianp in ticket #1209).

Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'
 1.123.2.2 24-May-2007  pavel branches: 1.123.2.2.4;
Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.123.2.1 12-May-2007  pavel branches: 1.123.2.1.2;
Pull up following revision(s) (requested by degroote in ticket #631):
sys/netinet6/icmp6.c: revision 1.126
Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo
 1.123.2.2.4.1 03-Oct-2008  jdc Pull up revision 1.150 (requested by adrianp in ticket #1209).

Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'
 1.123.2.1.2.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.129.2.3 07-May-2007  yamt sync with head.
 1.129.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.129.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.131.4.1 11-Jul-2007  mjf Sync with head.
 1.131.2.4 09-Oct-2007  ad Sync with head.
 1.131.2.3 20-Aug-2007  ad Sync with HEAD.
 1.131.2.2 15-Jul-2007  ad Sync with head.
 1.131.2.1 08-Jun-2007  ad Sync with head.
 1.134.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.135.6.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.135.6.1 19-Jul-2007  dyoung file icmp6.c was added on branch matt-mips64 on 2007-07-19 20:48:56 +0000
 1.135.4.6 09-Dec-2007  jmcneill Sync with HEAD.
 1.135.4.5 04-Nov-2007  jmcneill Sync with HEAD.
 1.135.4.4 31-Oct-2007  joerg Sync with HEAD.
 1.135.4.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.135.4.2 02-Oct-2007  joerg Sync with HEAD.
 1.135.4.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.136.2.3 23-Mar-2008  matt sync with HEAD
 1.136.2.2 09-Jan-2008  matt sync with HEAD
 1.136.2.1 06-Nov-2007  matt sync with HEAD
 1.137.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.140.4.1 08-Dec-2007  ad Sync with head.
 1.140.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.141.12.4 05-Oct-2008  mjf Sync with HEAD.
 1.141.12.3 28-Sep-2008  mjf Sync with HEAD.
 1.141.12.2 02-Jun-2008  mjf Sync with HEAD.
 1.141.12.1 03-Apr-2008  mjf Sync with HEAD.
 1.141.8.2 24-Mar-2008  keiichi sync with head.
 1.141.8.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.145.2.1 18-May-2008  yamt sync with head.
 1.146.2.4 09-Oct-2010  yamt sync with head
 1.146.2.3 11-Mar-2010  yamt sync with head
 1.146.2.2 04-May-2009  yamt sync with head.
 1.146.2.1 16-May-2008  yamt sync with head.
 1.148.6.1 19-Oct-2008  haad Sync with HEAD.
 1.148.2.2 10-Oct-2008  skrll Sync with HEAD.
 1.148.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.150.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.150.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.155.4.1 05-Mar-2011  rmind sync with head
 1.155.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.157.6.2 05-Apr-2012  mrg sync to latest -current.
 1.157.6.1 18-Feb-2012  mrg merge to -current.
 1.157.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.157.2.2 30-Oct-2012  yamt sync with head
 1.157.2.1 17-Apr-2012  yamt sync with head
 1.159.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.159.6.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.159.2.2 15-Nov-2015  bouyer Pull up following revision(s) (requested by ozaki-r in ticket #1327):
sys/netinet6/icmp6.c: revision 1.177
Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout
We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.
This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.
 1.159.2.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.161.2.3 03-Dec-2017  jdolecek update from HEAD
 1.161.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.161.2.1 23-Jun-2013  tls resync from head
 1.162.2.3 18-May-2014  rmind sync with head
 1.162.2.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.162.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.165.2.1 10-Aug-2014  tls Rebase.
 1.169.2.1 05-Nov-2015  riz Pull up following revision(s) (requested by ozaki-r in ticket #982):
sys/netinet6/icmp6.c: revision 1.177
Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout
We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.
This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.
 1.170.2.8 28-Aug-2017  skrll Sync with HEAD
 1.170.2.7 05-Feb-2017  skrll Sync with HEAD
 1.170.2.6 05-Dec-2016  skrll Sync with HEAD
 1.170.2.5 05-Oct-2016  skrll Sync with HEAD
 1.170.2.4 09-Jul-2016  skrll Sync with HEAD
 1.170.2.3 29-May-2016  skrll Sync with HEAD
 1.170.2.2 22-Apr-2016  skrll Sync with HEAD
 1.170.2.1 22-Sep-2015  skrll Sync with HEAD
 1.192.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.192.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.192.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.192.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.192.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.204.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.211.6.8 25-Oct-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1071):

sys/netinet6/icmp6.c: revision 1.240

Remove a leftover debug printf

Pointed out by hannken@
 1.211.6.7 23-Jun-2018  martin Pull up following revision(s) (requested by maxv in ticket #893):

sys/netinet6/icmp6.c: revision 1.228,1.230

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.
 1.211.6.6 08-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #852):

sys/netinet6/icmp6.c: revision 1.238
sys/netinet/ip_icmp.c: revision 1.171
sys/net/route.c: revision 1.210

Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release
the reference of a passed rtentry by themselves (but they didn't).
 1.211.6.5 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.211.6.4 31-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #665):

sys/netinet6/icmp6.c: revision 1.215

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.
* Memory leaks in icmp6_error2. They seem hardly triggerable.
* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@
* Memory leak in _icmp6_input. This one seems to be impossible to trigger.
 1.211.6.3 08-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #350):
sys/netinet6/icmp6.c: revision 1.214
sys/netinet6/raw_ip6.c: revision 1.158
Fix usages of ipsec_used
If IPsec isn't used, we must go back to the normal path.
PR kern/52659
 1.211.6.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.211.6.1 07-Jul-2017  martin Pull up following revision(s) (requested by knakahara in ticket #106):
sys/netinet6/icmp6.c: revision 1.212
fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.
XXX need to pullup to -8.
 1.223.2.8 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.223.2.7 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.223.2.6 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.223.2.5 25-Jun-2018  pgoyette Sync with HEAD
 1.223.2.4 21-May-2018  pgoyette Sync with HEAD
 1.223.2.3 02-May-2018  pgoyette Synch with HEAD
 1.223.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.223.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.238.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.238.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.238.2.1 10-Jun-2019  christos Sync with HEAD
 1.242.4.1 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1809):

sys/netinet6/raw_ip6.c: revision 1.184 (patch)
sys/netinet6/icmp6.c: revision 1.256 (patch)

Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.247.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.254.2.2 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #615):

sys/netinet6/raw_ip6.c: revision 1.184
sys/netinet6/icmp6.c: revision 1.256

Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.254.2.1 10-Dec-2023  martin Pull up following revision(s) (requested by pgoyette in ticket #487):

sys/compat/common/compat_90_mod.c: revision 1.5
sys/compat/common/compat_90_mod.c: revision 1.6
sys/netinet6/in6.c: revision 1.290
sys/netinet6/in6.c: revision 1.291
sys/compat/common/files.common: revision 1.11
sys/netinet6/icmp6.c: revision 1.255
sys/compat/common/net_inet6_nd_90.c: revision 1.1
sys/compat/common/net_inet6_nd_90.c: revision 1.2
sys/modules/compat_90/Makefile: revision 1.2
sys/modules/compat_90/Makefile: revision 1.3
sys/netinet6/nd6.c: revision 1.281
sys/compat/common/compat_mod.h: revision 1.10
sys/kern/compat_stub.c: revision 1.23
sys/sys/compat_stub.h: revision 1.27

Identify the need to rework the COMPAT_* code to be more
module-aware.
This is an XXX comment block only, NFCI.

Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

Allow kernels builds which don't define INET6 to compile compat bits
too.

Default the build of compat_90 module to include IPv6, as is done
for other INET6-sensitive modules (see if_lagg).
 1.257.2.1 02-Aug-2025  perseant Sync with HEAD
 1.14 04-Jun-2000  itojun remove include files in nonstandard path
(has been #error for couple of months).
 1.13 09-Feb-2000  itojun branches: 1.13.2;
to improve RFC2553/2292 compliance, and promote use of
RFC2553/2292-compliant header file path, now the following headers are
forbidden:
netinet6/ip6.h
netinet6/icmp6.h
netinet6/in6.h

if you want netinet6/{ip6,icmp6}.h, use netinet/{ip6,icmp6}.h.

if you want netinet6/in6.h, you just need to include netinet/in.h.
it pulls it in.
(we may need to integrate them into netinet/in.h, but for cross-BSD code
sharing i'd like to keep it like this for now)
 1.12 07-Feb-2000  itojun close comment.
From: Kazuto Ushioda <x-y-z@3si.co.jp>
 1.11 06-Feb-2000  itojun to be more rfc2292 complient, move ip6.h and icmp6.h into netinet.
(netinet6/{ip6,icmp6}.h is non-standard path - these files should go away)

it was not possible to use cvsmove in this case.
when you try to look at history, chase it toward netinet6/{ip6,icmp6}.h.
 1.10 19-Jan-2000  itojun another possible PR9189 issue (panic on sparc).
 1.9 18-Jan-2000  itojun temporary workaround for PR9189 (panic on sparc).
 1.8 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.7 02-Jan-2000  itojun add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.4 31-Jul-1999  itojun branches: 1.4.2; 1.4.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file icmp6.h was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file icmp6.h was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.1 20-Nov-2000  bouyer Remove files that are no longer on the trunck, and commit Makefile which
I forgot in the batch of commits.
 1.13.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.3 28-Apr-2008  martin branches: 1.3.4;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
Use <net/net_stats.h> / netstat_sysctl().
 1.1 15-Apr-2008  thorpej branches: 1.1.2;
Make ip6 and icmp6 stats per-cpu.
 1.1.2.1 18-May-2008  yamt sync with head.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.4.1 28-Apr-2008  mjf file icmp6_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:26 +0000
 1.293 05-Jun-2025  ozaki-r in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
 1.292 01-Mar-2024  riastradh branches: 1.292.2;
netinet6: Avoid NPD on `ifconfig ifN inet6 ... pltime 0 vltime 0'.

PR kern/53922
 1.291 09-Dec-2023  pgoyette Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

XXX pullup-10 - hopefully before RC2
 1.290 07-Dec-2023  pgoyette Identify the need to rework the COMPAT_* code to be more
module-aware.

This is an XXX comment block only, NFCI.
 1.289 03-Aug-2023  ozaki-r in6: clear ND6_IFF_IFDISABLED to allow DAD again on link-up
 1.288 24-Oct-2022  msaitoh branches: 1.288.2;
Clear saved_flags to avoid compile error on some archs.
 1.287 24-Oct-2022  knakahara Fix PR kern/57037

Be able to change the behavior sending parameter changing routing messages.
When set net.inet6.ip6.param_rt_msg=0, don't send parameter changing
routing messages.
When set net.inet6.ip6.param_rt_msg=1(default), send parameter changing
routing messages by RTM_NEWADDR.
 1.286 20-Sep-2022  knakahara Remove routes on an address removal if the routes referencing to the address. Implemented by ozaki-r@n.o.

A route that has a gateway is on a connected route can be invalid if the
connected route is deleted, i.e., an associated address is removed.
Traditionally NetBSD doesn't sweep such a route on the address removal. Sending
packets over the route fails with "No route to host". Also the route holds an
orphan ifaddr as rt_ifa that is destructed say by in_purgeaddr.

If the same address is assgined again in such a state, there can be two
different ifaddr objects with the same address. Until recently it's not a
big problem because we can send packets anyway. However after MP-ification
of the network stack, we can't send packets because we strictly check if rt_ifa
(i.e., the (old) ifaddr) is valid.

This change automatically removes such routes on a removal of an associated
address to avoid keeping inconsistent routes.
 1.285 05-Dec-2021  msaitoh s/existance/existence/ in comment.
 1.284 05-Dec-2021  msaitoh s/multple/multiple/ in comment.
 1.283 21-Sep-2021  christos don't opencode kauth_cred_get()
 1.282 29-Sep-2020  roy inet: Treat LINK_STATE_UNKNOWN as LINK_STATE_UP when changing

It's something we have always done.
it's really rare for anything to transition to UNKNOWN from either
UP or DOWN, but technically it is possible.
 1.281 16-Jun-2020  maxv remove unused
 1.280 14-Jun-2020  roy inet6: Allow addresses to be marked AUTOCONF from userland
 1.279 13-Jun-2020  mlelstv COMPAT_90 doesn't necessarily imply COMPAT_50. So include compat in6_var.h in
either case.

Fixes evbarm build that starts with COMPAT_60.
 1.278 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.277 20-Jan-2020  thorpej Remove FDDI support.
 1.276 25-Sep-2019  ozaki-r branches: 1.276.2;
Make panic messages more informative
 1.275 29-Apr-2019  roy branches: 1.275.2;
rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.274 18-Mar-2019  msaitoh s/pakcet/packet/ in comment.
 1.273 05-Feb-2019  mrg adjust fallthru comments to appease gcc7.
 1.272 29-Nov-2018  ozaki-r Don't run DAD on link-up if it's explicitly disabled
 1.271 29-Nov-2018  ozaki-r Introduce and use ip_dad_enabled() and ip6_dad_enabled() functions
 1.270 30-Oct-2018  ozaki-r Use rt_update framework on updating a rtentry
 1.269 04-Jul-2018  kamil Paper over Undefined Behavior in in6_control1()

Replace calculation of maxexpire (TIME_MAX) with a construct that triggers
UB with a one that uses implementation defined semantics.

No functional change intended.

An attempt to appease KUBSAn.

Detected with Kernel Undefined Behavior Sanitizer.

Reported by <Harry Pantazis>
 1.268 29-May-2018  prlw1 branches: 1.268.2;
Mark in6m as used for non-DIAGNOSTIC builds.
 1.267 29-May-2018  ozaki-r Avoid NULL pointer dereference on imm->i6mm_maddr
 1.266 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.265 06-Apr-2018  ozaki-r Make GARP work again when DAD is disabled

The change avoids setting an IP address tentative on initializing it when the
IPv4 DAD is disabled (net.inet.ip.dad_count=0), which allows a GARP packet to be
sent (see arpannounce). This is the same behavior of NetBSD 7, i.e., before
introducing the IPv4 DAD.

Additionally do the same change to IPv6 DAD for consistency.

The change is suggested by roy@
 1.264 06-Mar-2018  ozaki-r Use pool(9) for llentry allocations

llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.263 06-Mar-2018  ozaki-r Fix memory leaks on arp -d and ndp -d for static entries

We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.
 1.262 06-Mar-2018  ozaki-r Fix reference leaks of llentry

callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).

While here, we can remove remaining abuses of mutex_owned for softnet_lock.
 1.261 06-Mar-2018  ozaki-r Add assertions

We must not destroy llentries holding mbufs.
 1.260 24-Feb-2018  ozaki-r branches: 1.260.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.259 19-Jan-2018  ozaki-r Suppress noisy debugging outputs

Even if DEBUG they are too noisy under load.
 1.258 15-Jan-2018  ozaki-r Remove extra pserialize_perform from in_purgeaddr

It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
 1.257 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.256 25-Dec-2017  ozaki-r Fix wrong usage of psref_held

We can't use it for checking if a caller does NOT hold a given target.
If you want to do it you should have psref_not_held or something.
 1.255 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.254 23-Nov-2017  ozaki-r Fix a race condition of in6_ifinit

in6_ifinit checks the number of IPv6 addresses on a given interface and
if it's zero (i.e., an IPv6 address being assigned to the interface
is the first one), call if_addr_init. However, the actual assignment of
the address (ifa_insert) is out of in6_ifinit. The check and the
assignment must be done atomically.

Fix it by holding in6_ifaddr_lock during in6_ifinit and ifa_insert.
And also add missing pserialize to IFADDR_READER_FOREACH.
 1.253 23-Nov-2017  ozaki-r Tweak a condition; we don't need to care ifacount to be negative
 1.252 23-Nov-2017  ozaki-r Remove unnecessary goto because there is no cleanup code to share (NFC)
 1.251 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.250 10-Nov-2017  ozaki-r Fix a deadlock between a route update and lltable

It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.

If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.

A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update

Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.

Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.249 10-Nov-2017  ozaki-r Remove redundant KASSERTMSG

The function is static, has just one caller and the caller does the same check.
 1.248 22-Jun-2017  ozaki-r Purge ARP/NDP entries on an interface when the interface is down

Fix PR kern/51179
 1.247 22-Jun-2017  ozaki-r Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry

This behavior is a bit odd and should be fixed in the future...
 1.246 21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.245 28-Apr-2017  ozaki-r branches: 1.245.2;
Don't output debugging logs just if DIAGNOSTIC

Also make log messages informative.
 1.244 02-Mar-2017  ozaki-r branches: 1.244.4;
Plug a race condition on accessing i6mm_maddr
 1.243 02-Mar-2017  ozaki-r Fix racy in6m_sol

Relook up the entry instead of reusing it, which makes locking simple.
 1.242 02-Mar-2017  ozaki-r Protect ia6_memberships by in6_ifaddr_lock
 1.241 01-Mar-2017  ozaki-r Restore/add some softnet_lock for nd6_rt_flush and defrouter_addreq

May help PR kern/52015
 1.240 28-Feb-2017  ozaki-r Separate the code of joining multicast groups

No functional change.
 1.239 28-Feb-2017  ozaki-r Prevent ia6 from being freed in in6_ifinit

It fixes a panic (diagnostic assertion "entry->ple_prevp != NULL" failed)
on:
ifconfig lo1 create
ifconfig lo1 127.0.0.2
reported by ryo@
 1.238 23-Feb-2017  ozaki-r Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.237 23-Jan-2017  ozaki-r Replace some splnet with splsoftnet
 1.236 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.235 16-Jan-2017  ozaki-r Remove KASSERT (revert in6.c,v 1.232)

We don't need it (it's harmless though).
 1.234 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.233 12-Jan-2017  ozaki-r branches: 1.233.2;
Prevent in6_ifaddr from being freed with holding its psref

This is a possible fix for PR kern/51828.
 1.232 11-Jan-2017  christos Add KASSERT.
 1.231 10-Jan-2017  ozaki-r Enable some sysctl knobs on rump kernels for ifmcstat
 1.230 04-Jan-2017  christos - kill NULL argument from in6_update_ifa
- amend in6_update_ifa1 to return the ia, so that we can use it in pfil hooks
to avoid NULL pointer crash.
 1.229 03-Jan-2017  christos simplify, and call the hooks after the address has been deleted like we did
for the ipv4 case.
 1.228 31-Dec-2016  ryo In the case of SIOCDIFADDR, call pfil_run_addrhooks before release ia.
 1.227 27-Dec-2016  ozaki-r Fix panic in pfil_run_hooks on bootup

XXX a kernel with pf still fails to boot up. Please someone fix it.
 1.226 21-Dec-2016  ozaki-r Fix deadlock between llentry timers and destruction of llentry

llentry timer (of nd6) holds both llentry's lock and softnet_lock.
A caller also holds them and calls callout_halt to wait for the
timer to quit. However we can pass only one lock to callout_halt,
so passing either of them can cause a deadlock. Fix it by avoid
calling callout_halt without holding llentry's lock.

BTW in the first place we cannot pass llentry's lock to callout_halt
because it's a rwlock...
 1.225 19-Dec-2016  ozaki-r Protect IPv6 default router and prefix lists with coarse-grained rwlock

in6_purgeaddr (in6_unlink_ifa) itself unrefernces a prefix entry and calls
nd6_prelist_remove if the counter becomes 0, so callers doesn't need to
handle the reference counting.

Performance-sensitive paths (sending/forwarding packets) call just one
reader lock. This is a trade-off between performance impact vs. the amount
of efforts; if we want to remove the reader lock, we need huge amount of
works including destroying objects with psz/psref in softint, for example.
 1.224 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.223 11-Dec-2016  ozaki-r Add nd6_ prefix to exported functions
 1.222 18-Nov-2016  knakahara fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.221 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.220 13-Sep-2016  christos revert previous, roy says it breaks DaD.
 1.219 13-Sep-2016  christos When initializing addresses, reset the interface flags to 0. This fixes
an issue where point to point addresses that started down, and then came
up, were left with stale flags on one side of the point to point link.
 1.218 02-Sep-2016  roy This comment no longer applies.
 1.217 18-Aug-2016  roy Revert part of the prior patch so loopback lladdr gets a working prefix route.
 1.216 16-Aug-2016  roy Separate ioctl address prefix management from RA prefix management
as we have no API for controlling the latter.

This fixes a long standing problem where addresses added with non /128
prefixes and non infinte address lifetimes would register a prefix route
which would expire. Subsequent calls set new lifetimes for the same address
would not affect the prefix route management, so once expired, the
prefix route would be impossible to add back as the kernel would remove it.
 1.215 05-Aug-2016  ozaki-r CID 1364757: remove unnecessary branching
 1.214 01-Aug-2016  ozaki-r Fix kernel builds (gcc 4.8)
 1.213 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.212 28-Jul-2016  ozaki-r Fix panic on adding/deleting IP addresses under network load

Adding and deleting IP addresses aren't serialized with other network
opeartions, e.g., forwarding packets. So if we add or delete an IP
address under network load, a kernel panic may happen on manipulating
network-related shared objects such as rtentry and rtcache.

To avoid such panicks, we still need to hold softnet_lock in in_control
and in6_control that are called via ioctl and do network-related operations
including IP address additions/deletions.

Fix PR kern/51356
 1.211 20-Jul-2016  ozaki-r Get rid of extra ifafree

It was wrongly imported from FreeBSD.
 1.210 20-Jul-2016  ozaki-r Apply pserialize to some iterations of IP address lists
 1.209 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.208 08-Jul-2016  ozaki-r branches: 1.208.2;
CID 1363345: remove unreachable code and cleanup returns
 1.207 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.206 06-Jul-2016  ozaki-r Move in6_ifaddr_list to a more proper place (from ip6_input.c to in6.c)

It's a similar place as the IPv4 address list, i.e., in.c.

More varibles will join together.
 1.205 06-Jul-2016  ozaki-r Add missing IN6_ADDRLIST_ENTRY_DESTROY
 1.204 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.203 04-Jul-2016  ozaki-r Remove redundant codes purging IPv6 addresses

Proposed on tech-net and tech-kern.
 1.202 30-Jun-2016  ozaki-r Make sure that ifaddr is published after its initialization finished

Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
 1.201 28-Jun-2016  ozaki-r Introduce if_is_deactivated

Checking ifp->if_output == if_nulloutput is too implicit.

No functional change.
 1.200 22-Jun-2016  ozaki-r Remove unnecessary NULL checks of ifa->ifa_addr

If it's NULL, it should be a bug. There many IFADDR_FOREACH that don't do
NULL check. If it can be NULL, they should fire already.
 1.199 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.198 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.197 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.196 01-Apr-2016  ozaki-r Use __func__ in log messages
 1.195 15-Feb-2016  rtr Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.194 12-Dec-2015  christos Hook up the addrctl stuff that's already there.
 1.193 27-Nov-2015  ozaki-r Replace __debugused with __diagused

Declaring __debugused was just a mistake. This fixes builds of kernels with
DEBUG but without DIAGNOSTIC.
 1.192 25-Nov-2015  ozaki-r Declare __debugused for no DIAGNOSTIC kernels

This unbreaks hpcsh GENERIC kernel build.
 1.191 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.190 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.189 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.188 22-Apr-2015  roy Move INET6 specific in6_if_{up,down}() and in6_if_link_{up,down}()
into agnostic domain functions.
 1.187 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.186 07-Apr-2015  roy Move in6if_do_dad() to if_do_dad() as the routine is not INET6 specific
and could equally be used by INET.
 1.185 26-Feb-2015  roy Don't add local routes for the any address or p2p addresses where the address matches the destination.
 1.184 26-Feb-2015  roy Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
 1.183 25-Feb-2015  roy Retire nd6_newaddrmsg and use rt_newaddrmsg directly instead so that
we don't spam route changes when the route hasn't changed.
 1.182 23-Feb-2015  martin Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.181 20-Feb-2015  rjs Declare input argument to in6_sin_2_v4mapsin6 to be const, allows an
address from the route cache to be used as the input.

ok christos@.
 1.180 02-Dec-2014  christos add routines to print in6_addr and sockaddr_in6 (in6_print, sin6_print)
 1.179 03-Nov-2014  roy branches: 1.179.2;
Clear IN6_IFF_DUPLICATED when link goes down or up.
 1.178 27-Oct-2014  christos print mapped addresses better
 1.177 20-Oct-2014  roy Remove the ability for userland to toggle IN6_IFF_TENTATIVE.
Preserve IN6_IFF_TENTATIVE when updating address flags.
 1.176 09-Sep-2014  rmind Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.175 05-Sep-2014  matt Don't use C++ keyword as variable.
Use different prefix for nd6_prefixctl members than for nd6_prefix members.
 1.174 01-Jul-2014  justin branches: 1.174.2;
On ARM the variable name 'delay' shadows a function here, rename to avoid
-Wshadow objecting.
 1.173 01-Jul-2014  ozaki-r Stop using callout randomly

nd6_dad_start uses callout when xtick > 0 while doesn't when
xtick == 0. So if we pass a random value ranging from 0 to N,
nd6_dad_start uses callout randomly. This behavior makes
debugging difficult.

Discussed in http://mail-index.netbsd.org/tech-kern/2014/06/25/msg017278.html
 1.172 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.171 06-Jun-2014  rmind - Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.170 17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.169 15-Jan-2014  roy branches: 1.169.2;
Remove dead code.
 1.168 13-Jan-2014  roy Remove the now un-used function in6ifa_ifplocaladdr.
 1.167 11-Sep-2013  christos Include BRDADDR and NETMASK to the v4 ioctls we ban for v6; from FreeBSD.
Remove X25 stuff which has been GC'ed.
XXX: pullup-5,6
 1.166 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.165 20-Jun-2013  roy branches: 1.165.2;
Move the detaching and making tentative addresses out if in6_if_up
and into in6_if_link_up.

This fixes a possible panic where link is up but not the interface.
Note that a better solution would be to listen to the routing socket
in the kernel, but I don't know how to do that.

Reachable Router tests for IFF_UP as well.
 1.164 11-Jun-2013  roy When an interface link state changes to down, mark all attached IPv6
addresses as detached.
Likewise, when the link state changes to up, mark all detached IPv6
as tentative and start DAD on them.

Advertised router reachability now checks that link state is not down.
This means that when an interface link state changes, the default IPv6
router may change as well.
 1.163 29-May-2013  roy Generate RTM_NEWADDR when adding a pre-existing IPv6 address.
 1.162 21-May-2013  roy For IPv6, emit RTM_NEWADDR once DAD completes and also when address flag
changes. Tentative addresses are not emitted.

Version bumped so userland can detect this behaviour change.
 1.161 23-Jun-2012  christos branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.160 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.159 19-Nov-2011  tls branches: 1.159.2; 1.159.4; 1.159.8; 1.159.10;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.158 19-Oct-2011  dyoung branches: 1.158.2;
Use if_addr_init() and if_mcast_op() instead of ifp->if_ioctl().
 1.157 06-Feb-2011  dyoung Delete unnecessary casts to void *. No functional change intended. Same
assembly generated before and after this change.
 1.156 22-Apr-2010  dyoung branches: 1.156.2; 1.156.4;
When choosing IPv6 source addresses, respect the ifaddr preference
level such as one might set with 'ifconfig xx0 inet6 <address>
preference <pref>'. I've been running this for many months without
any problems.
 1.155 07-Apr-2010  oki ip6_sprintf: compress the zeros of representation of the IPv6 address.
see RFC4291 section 2.2 item 2.
 1.154 19-Sep-2009  christos branches: 1.154.2; 1.154.4;
backout the changes that establish a workqueue to synchronize the addresses
for arg and gre because they cause a race condition by calling ioctl() during
interface initialization. To make this work correctly we would need to
synchronize all interface init routines.
 1.153 11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.152 13-Aug-2009  dyoung Postpone to a workqueue adding link-local and loopback IPv6 addresses
to an interface. This keeps the kernel from entering ifp->if_ioctl
recursively, which can deadlock if if_ioctl takes locks. This will
fix deadlocks & LOCKDEBUG errors in agr(4) (kern/39940) and in
gre(4).
 1.151 12-May-2009  elad Remove "privileged" variable, perform the kauth(9) call before we go into
splnet() for the privileged commands. Privileged commands were marked as
such for clarity.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/05/08/msg001283.html
 1.150 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.149 18-Mar-2009  cegger bcopy -> memcpy
 1.148 18-Mar-2009  cegger bzero -> memset
 1.147 18-Mar-2009  cegger bcmp -> memcmp
 1.146 05-Feb-2009  dyoung branches: 1.146.2;
Use the in6_ifaddr ia_ifa member instead of casting from from
in6_ifaddr to ifaddr.

Remove unnecessary parentheses. Do not needlessly cast RTM_ADD to
int.

No functional change intended.
 1.145 15-Jan-2009  christos Emulate a couple more ioctls. Thanks to Matthias Drochner for pointing them out.
 1.144 15-Jan-2009  christos - switch the lifetime struct to time_t and provide compatibility for the
old ioctl.
 1.143 19-Dec-2008  cegger use M_ZERO on malloc() and remove subsequent bzero().
 1.142 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.141 31-Jul-2008  matt branches: 1.141.2; 1.141.4; 1.141.10; 1.141.14;
Generalize previous fix so that both NS and NA packets are checked.
 1.140 27-Feb-2008  matt branches: 1.140.4; 1.140.6; 1.140.10;
Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.139 06-Dec-2007  dyoung branches: 1.139.8; 1.139.12;
Use ifa_insert(), ifa_remove().
 1.138 05-Dec-2007  dyoung Extract common code, creating a subroutine if_purgeaddrs(ifp,
family, purgeaddr) which applies function `purgeaddr' to each
address on `ifp' belonging to `family'.
 1.137 05-Dec-2007  dyoung Use IFADDR_FIRST(), IFADDR_NEXT().
 1.136 04-Dec-2007  dyoung Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.135 10-Nov-2007  dyoung branches: 1.135.2;
Use sockaddr_in6_init().
 1.134 24-Oct-2007  dyoung branches: 1.134.2;
Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.133 16-Sep-2007  dyoung branches: 1.133.4;
Cosmetic: shorten staircase.
 1.132 11-Sep-2007  gdt Remove SIOCSIFALIFETIME_IN6, which could not possibly have ever worked.

Problem reported in kern/35897 by Robert Elz.
 1.131 19-Jul-2007  dyoung branches: 1.131.4; 1.131.6; 1.131.8;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.130 28-Jun-2007  christos branches: 1.130.2;
Add functions to do mapped address conversions from FreeBSD.
 1.129 27-May-2007  cube Tyop.
 1.128 23-May-2007  christos fix typos in previous
 1.127 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.126 15-Mar-2007  dyoung KNF: compare pointer w/ NULL, don't "check truth". Fix K&R parameter
types declaration.
 1.125 04-Mar-2007  christos branches: 1.125.2; 1.125.4; 1.125.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.124 22-Feb-2007  dyoung Cosmetic: use TAILQ_FOREACH(). Join lines.
 1.123 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.122 04-Jan-2007  elad branches: 1.122.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.121 02-Dec-2006  dyoung Synchronize access to the ifaddr list by in6_update_ifa() and
in6_control() with splnet()/splx(). I was being a bit paranoid
here. Following a cursory analysis of the code, this still looked
necessary. We don't spend a lot of time in these calls, so it
should not be too harmful to suspend network interrupts.

In in6_unlink_ifa(), call in6_delmulti() just once on each multicast
address (in6_multi). Previously, in6_unlink_ifa() called in6_delmulti()
on each in6_multi until in6_delmulti() removed the in6_multi from
the list and freed its memory. That's not justified: the multicast
list holds *one* reference. All other references belong to other
entities. We must wait to free the memory until the other entities
release their references, to protect against dereferencing a freed
in6_multi.

XXX I need to revisit in6_delmulti(), in6_unlink_ifa(), and friends,
XXX to pry apart the conditions where an in6_multi is removed from
XXX its list and where it is freed. Following my change, above,
XXX we still risk dereferencing a freed in6_multi.

Prevent in6_update_ifa() and in6_addremloop() from creating dangling
pointers to interfaces in the routing table. Previously, my NetBSD
tunnel concentrator, which adds and deletes a lot of P2P interfaces
with the same local address, crashed in 8 hours or less when it
dereferenced a dangling pointer to a deleted ifnet. Now, its uptime
is greater than 3 days.
 1.120 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.119 24-Nov-2006  christos branches: 1.119.2; 1.119.8;
fix spelling of accommodate; from Zapher.
 1.118 20-Nov-2006  dyoung Cosmetic: join two lines.
 1.117 18-Nov-2006  dyoung Remove __P(). Use LIST_ macros instead of accessing lh_first
directly.
 1.116 18-Nov-2006  dyoung Cosmetic: use TAILQ_FOREACH(). Remove superfluous parentheses from
return statements.
 1.115 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.114 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.113 15-Oct-2006  dyoung Make SIOCALIFADDR work for adding IPv6 addresses: initialize the
lifetime of the addresses to infinity (ND6_INFINITE_LIFETIME).

Nobody squealed when I proposed this on tech-net.
 1.112 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.111 26-Sep-2006  is Fix typo in comment
 1.110 25-Aug-2006  matt branches: 1.110.2; 1.110.4;
Don't include <netccitt/x25.> and don't bother checking for SIOCSIFCONF_X25.
 1.109 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.108 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.107 03-Jun-2006  dogcow include <netccitt/x25.h> for the SIOCSIFCONF_X25 case in in6_control.
 1.106 03-Jun-2006  christos Fix typo.
 1.105 03-Jun-2006  christos add 2 more ioctls that use struct ifaddr *, and remove debugging printfs
I accidentally committed.
 1.104 03-Jun-2006  christos This is ugly, but it is the simplest fix to avoid calling in the default
case:

<driver>_ioctl(ifp, SIOCSIFADDR, struct ifreq *)

where it should be calling:

<driver>_ioctl(ifp, SIOCSIFADDR, struct ifaddr *)

and "Bad Things Happen (TM)"

Returning an error is good enough because none of the drivers handle INET6.

The problem here is that handling SIOCSIFADDR is a kludge. The ioctl gets
passed a struct ifreq * from userland, but then in the control routines
SIOCSIFADDR is handled "specially", and we call:

ifp->if_ioctl(ifp, SIOCSIFADDR, struct ifaddr *)

directly with the ifaddr we computed for that interface. It would be nice
if we called the ioctl routine if the original struct ifreq, and computed
the ifaddr, or passed it directly. This way all the ioctls would be treated
the same way, and we would not have the problem of pointer overloading.
 1.103 18-May-2006  liamjfoy branches: 1.103.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.102 14-May-2006  elad integrate kauth.
 1.101 17-Mar-2006  rpaulo 0 > len ==> len < 0
 1.100 17-Mar-2006  rpaulo 0 > len ==> len < 0
 1.99 06-Mar-2006  rpaulo branches: 1.99.2; 1.99.4;
Rename local variables called delay that shadow the delay() decl.
Pointed out by Robert Swindells.
 1.98 05-Mar-2006  rpaulo NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.97 03-Mar-2006  rpaulo branches: 1.97.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.96 08-Feb-2006  rpaulo Fix copy&paste problem found by James Juran
<James.Juran@baesystems.com> in freebsd-net mailing list.
bzero'ing the wrong var with a wrong sizeof is clearly not ok..
 1.95 21-Jan-2006  rpaulo branches: 1.95.2; 1.95.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.94 11-Dec-2005  christos branches: 1.94.2;
merge ktrace-lwp.
 1.93 29-May-2005  christos branches: 1.93.2;
- avoid shadowed variables
- sprinkle const.
 1.92 17-May-2005  christos Yes, it was a cool trick >20 years ago to use "0123456789abcdef"[a] to
implement, xtoa(), but I think defining the samestring 50 times is a bit
too much. Defined HEXDIGITS and hexdigits in subr_prf.c and use it...
 1.91 01-Feb-2005  drochner branches: 1.91.4; 1.91.6; 1.91.8;
remove the unused in6_ifindex2scopeid()
if at all, it works with site-local addresses whose fate is uncertain
to say the least
 1.90 26-Jul-2004  yamt branches: 1.90.4; 1.90.6;
run PFIL_IFADDR hooks on SIOCAIFADDR_IN6 and SIOCDIFADDR_IN6 as well.

from Peter Postma, PR/26368.
ok'ed by itojun.
 1.89 16-Jun-2004  itojun multicast data management fix - previous fix was incorrect. jinmei@kame
 1.88 14-Jun-2004  itojun use macro and make it a bit more readable.
 1.87 14-Jun-2004  itojun check before joining multicast group. otherwise multiple in6_multi structure
will be kept. reported by patrick latifi
 1.86 28-Mar-2004  christos PR/23335: Christos Zoulas: Removing interfaces trashes free memory when
ipv6 is used because multicast group memberships contain dangling references
to the multicast group deleted.
 1.85 23-Feb-2004  itojun avoid out-of-bound memory access if len == 128.
from Ted Unangst via Colin Percival
 1.84 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.83 04-Dec-2003  keihan netbsd.org -> NetBSD.org

This was the last commit of this kind to src/sys, which is now totally
"NetBSD.org clean". Thanks for the patiance, and sorry for all the commits.
 1.82 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.81 15-Oct-2003  itojun backout previous (ENETREST special handlng)
 1.80 15-Oct-2003  itojun ignore ENETRESET on ADDMULTI.
 1.79 05-Sep-2003  itojun u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.78 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.77 24-Jun-2003  itojun branches: 1.77.2;
typo
 1.76 24-Jun-2003  itojun use time.tv_sec directly
 1.75 14-May-2003  wiz constant usually has two n.
 1.74 27-Feb-2003  thorpej Add in6_localaddr(). From KAME via FreeBSD.
 1.73 24-Feb-2003  matt automatic aggregates are evil. make it static const.
 1.72 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.71 17-Oct-2002  itojun do not differentiate manually configured address from autoconfigured ones
wrt prefix management;
- always earn a reference to the prefix when an address is configured
(by ioctl).
- always delete the prefix when an address that has the last referene
is manually removed.

The change should solve the problem raised in KAME-snap 6989.

sync w/kame
 1.70 23-Sep-2002  itojun better fix to PR 18163 ("deprecated" flag manipulation). sync w/kame
 1.69 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.68 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.67 11-Jun-2002  itojun silence some of log(), as the codepath will be visited for IPv6-non-capable
interfaces too and can be annoying. net.inet6.icmp6.nd6_debug will
re-enable them.
 1.66 09-Jun-2002  itojun whitespace cleanup
 1.65 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.64 08-Jun-2002  itojun in6_len2mask is a duplicate of in6_prefixlen2mask. unify. sync w/kame
 1.63 08-Jun-2002  itojun on SIOCAIFADDR_IN6 check if sin6_len is sane. sync w/kame
 1.62 07-Jun-2002  itojun typo
 1.61 07-Jun-2002  itojun 'fall through' is not a valid LINT keyword.
 1.60 07-Jun-2002  itojun remove support for deprecated ioctls (EINVAL). sync w/kame
 1.59 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.58 29-May-2002  itojun move per-interface ip6/icmp6 stat to ifnet->if_afdata. sync w/kame
 1.57 25-May-2002  itojun we have no IFT_DUMMY. kame merge mistake
 1.56 23-May-2002  itojun remove wrong "break" statement
 1.55 23-May-2002  itojun simplify conditions to do DAD. sync w/kame
 1.54 23-May-2002  itojun should perform DAD for IFT_GIF.
 1.53 23-Mar-2002  itojun branches: 1.53.2;
fix arg to bcmp() - need to compare 15 bytes, not 3 bytes. sync w/kame
 1.52 21-Dec-2001  itojun whitespace/costmetic sync w/kame
 1.51 20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.50 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.49 13-Nov-2001  lukem add RCSIDs
 1.48 18-Oct-2001  itojun reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.47 25-Jul-2001  itojun ifidex2ifnet could contain NULL after if_detach(). sync with kame
 1.46 18-Jul-2001  itojun do not malloc() during interrupt context for IPv6 multicast kludge table.
malloc() during interface initialization. sync with kame
 1.45 13-Apr-2001  thorpej branches: 1.45.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.44 16-Feb-2001  itojun branches: 1.44.2;
wording in comment.
is contradict -> "is contradictory", or "contradicts".
 1.43 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.42 11-Feb-2001  itojun add missing IFAFREE() in error recovery case.
 1.41 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.40 07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.39 18-Jan-2001  itojun workaround to avoid EMSGSIZE when ND6 table for the outgoing interface
is not initialized (should result in "interface down").
 1.38 04-Dec-2000  itojun make sure we don't touch uninitialized pointer. from: fvdl
 1.37 05-Nov-2000  onoe First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.36 28-Oct-2000  itojun do not panic on "ifconfig inet6 fe80::1 -alias". from Todd Fries.
KAME PR 295.
 1.35 06-Oct-2000  itojun remove obsolete handling code for SIOCSIFPHY*. they are now in ifioctl().
sync with kame.
 1.34 02-Aug-2000  itojun inhibit error code from rtinit(). this happens when we try to assign
multiple addresses from same prefix, onto single interface. PR 10427.


more info:
- 4.4BSD did not check return code from in_ifinit() at all.
4.4BSD does not support multiple address from same prefix.
- past KAME change passed in{,6}_ifinit() to upwards, toward ifconfig(8).
the behavior is filed as PR 10427.
- the commit inhibits EEXIST from rtinit(), hence partially recovers old
4.4BSD behavior.
- the right thing to happen is to properly support multiple address assignment
from the same prefix. KAME tree has more extensive change, however, it needs
much more time to get stabilized (rtentry refcnt change can cause serious
issue, we really need to bake it before bring it to netbsd)
 1.33 13-Jul-2000  itojun fatal bug fix from kame (rtentry refcnt goes negative if we play with IPv6
address/routing table too much).

in6_ifloop_request()
not to request rtrequest to return an rtentry except for the ADD
operation, in order to avoid misdecreasing the refcnt (which might
cause leak of rtentry)
 1.32 27-Apr-2000  itojun branches: 1.32.4;
misuse of free(ia) in #if 0'ed region.
From: Lennart Augustsson <lennart@augustsson.net>
 1.31 16-Apr-2000  itojun perform neighbor unreachability detection on p2p links (spec requires
it for bidir p2p links).
improve -i in ndp(8) to allow tweaking per-interface ND flag on.
fix ndp(8) infinite loop on certain routing table setup.
 1.30 16-Apr-2000  itojun better sync with latest kame (cosmetic only).
 1.29 12-Apr-2000  itojun revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.28 24-Mar-2000  itojun move ia6->ia6_dad_ch to dp->dad_timer_ch, to ease KAME code sharing.
now in6_var.h does not need to pull sys/callout.h in.
 1.27 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.26 21-Mar-2000  itojun improve comment (about undo'ing code on in{,6}_ifinit failure)
 1.25 18-Mar-2000  itojun #if 0'ed undo code for interface address addition failure.
it was a bit too strong, and forbids multiple addresses from
same prefix to be assigned.

now the behavior is the same as previous - memory leak on interface address
addition failure.
http://orange.kame.net/dev/query-pr.cgi?pr=218
 1.24 12-Mar-2000  itojun undo interface addition attempt if in6_ifinit() fails.
without it, :: will be kept if in6_ifinit() fails.
 1.23 02-Mar-2000  itojun don't configure ifa_dstaddr for non-pointopoint interface,
so that we won't be returning them from routing socket manipulation.
 1.22 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.21 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.20 25-Feb-2000  itojun on SIOCS*_IN6, validate sockaddrs so that we never configure non-AF_INET6
addresses. (in_control has the same problem - I'll need to check it as well)

obsolete the following two ioctls, they do not fit well against IPv6 addressing
model. (the kernel support them for some period of time, we'll remove them
in the near future)
SIOCSIFDSTADDR_IN6
SIOCSIFNETMASK_IN6
 1.19 24-Feb-2000  itojun remove never-referenced variable (in6_interfaces).
fix paren match for macro.
 1.18 24-Feb-2000  itojun costmetic (remove space at EOL)
 1.17 07-Feb-2000  itojun correct SIOCAIFADDR_IN6 failure recovery in point-to-point case.
 1.16 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.15 04-Feb-2000  itojun avoid some of typecasting from in6_ifaddr to ifaddr.
 1.14 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.13 02-Feb-2000  itojun make sure to nuke kludge entries, regardless from refcnt.
 1.12 02-Feb-2000  itojun implement in6_purgemkludge(). in6_ifdetach() calls it to avoid dangling
kludge entries. the situation would occur if you take the following steps:
- join multicast groups (default ones like linklocal all-node is fine)
- remove all IPv6 addresses manually
- remove pcmcia card

to thorpej: pls call in6_ifdetach() when PRU_PURGEIF is raised (just before
removing ifnet). it should do the right thing (unable to perform real test
though)
 1.11 02-Feb-2000  itojun remove route to link-local allnodes multicast address (ff02:x::/32),
when the last IPv6 address on an interface is get removed.
in6_ifattach() configures it and in6_ifdetach() removes it.

XXX last part of in6_purgeaddr looks very ugly, but there's no event for
"interface detach" (events are for "address detach").
 1.10 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.9 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.8 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.7 26-Sep-1999  is branches: 1.7.2; 1.7.8;
Add missing "case IFT_ARCNET".
 1.6 19-Sep-1999  is Zeroth version of IPv6 support for ARCnet. Correct MTU handling still needs
to be done.
 1.5 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6.c was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.7.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.7.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.7.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.7.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.7.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.7.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.32.4.6 24-Dec-2001  he Pull up revision 1.38 (requested by jdolecek):
Make sure we don't use an uninitialized pointer.
 1.32.4.5 26-Feb-2001  he Pull up revision 1.39 (requested by itojun):
Workaround to avoid EMSGSIZE when ND6 table for outgoing interface
is not initialized (should result in "interface down").
 1.32.4.4 30-Oct-2000  tv Pullup 1.36 [itojun]:
do not panic on "ifconfig inet6 fe80::1 -alias". from Todd Fries.
KAME PR 295.
 1.32.4.3 06-Oct-2000  itojun pullup (approved by releng-1-5)
move privilege check for SIOCSIFPHY* from in{,6}_control to ifioctl.
fix privilege check mistakes (which allows non-root user to modify gif
physical address in some cases). sync with kame.
> cvs rdiff -r1.62 -r1.63 syssrc/sys/netinet/in.c
> cvs rdiff -r1.34 -r1.35 syssrc/sys/netinet6/in6.c
> cvs rdiff -r1.71 -r1.73 syssrc/sys/net/if.c
 1.32.4.2 18-Aug-2000  itojun pullup (approved by releng-1-5)
sys/netinet6/in6.c 1.33 -> 1.34
sys/netinet/in.c 1.61 -> 1.62

> inhibit error code from rtinit(). this happens when we try to assign
> multiple addresses from same prefix, onto single interface. PR 10427.
 1.32.4.1 13-Jul-2000  itojun pullup 1.32 -> 1.33 (approved by releng-1-5)

fatal bug fix from kame (rtentry refcnt goes negative if we play with IPv6
address/routing table too much).

in6_ifloop_request()
not to request rtrequest to return an rtentry except for the ADD
operation, in order to avoid misdecreasing the refcnt (which might
cause leak of rtentry)
 1.44.2.10 11-Nov-2002  nathanw Catch up to -current
 1.44.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.44.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.44.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.44.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.44.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.44.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.44.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.44.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.44.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.45.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.45.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.45.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.45.2.1 03-Aug-2001  lukem update to -current
 1.53.2.2 20-Jun-2002  gehenna catch up with -current.
 1.53.2.1 30-May-2002  gehenna Catch up with -current.
 1.77.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.77.2.4 04-Feb-2005  skrll Sync with HEAD.
 1.77.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.77.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.77.2.1 03-Aug-2004  skrll Sync with HEAD
 1.90.6.1 12-Feb-2005  yamt sync with head.
 1.90.4.1 29-Apr-2005  kent sync with -current
 1.91.8.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.91.6.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.91.4.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.93.2.8 17-Mar-2008  yamt sync with head.
 1.93.2.7 07-Dec-2007  yamt sync with head
 1.93.2.6 15-Nov-2007  yamt sync with head.
 1.93.2.5 27-Oct-2007  yamt sync with head.
 1.93.2.4 03-Sep-2007  yamt sync with head.
 1.93.2.3 26-Feb-2007  yamt sync with head.
 1.93.2.2 30-Dec-2006  yamt sync with head.
 1.93.2.1 21-Jun-2006  yamt sync with head.
 1.94.2.2 18-Feb-2006  yamt sync with head.
 1.94.2.1 01-Feb-2006  yamt sync with head.
 1.95.4.6 03-Jun-2006  kardel Sync with head.
 1.95.4.5 01-Jun-2006  kardel Sync with head.
 1.95.4.4 30-Apr-2006  kardel - cast constant 1 to time_t to smooth a future
transition to a 64 bit time_t
 1.95.4.3 22-Apr-2006  simonb Update for timecounters - use getnanotime() and time_second variable.
 1.95.4.2 22-Apr-2006  simonb Sync with head.
 1.95.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.95.2.1 09-Sep-2006  rpaulo sync with head
 1.97.2.6 03-Sep-2006  yamt sync with head.
 1.97.2.5 11-Aug-2006  yamt sync with head
 1.97.2.4 26-Jun-2006  yamt sync with head.
 1.97.2.3 24-May-2006  yamt sync with head.
 1.97.2.2 01-Apr-2006  yamt sync with head.
 1.97.2.1 13-Mar-2006  yamt sync with head.
 1.99.4.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.99.4.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.99.2.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.99.2.3 19-Apr-2006  elad sync with head.
 1.99.2.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.99.2.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.103.2.1 19-Jun-2006  chap Sync with head.
 1.110.4.2 10-Dec-2006  yamt sync with head.
 1.110.4.1 22-Oct-2006  yamt sync with head
 1.110.2.2 12-Jan-2007  ad Sync with head.
 1.110.2.1 18-Nov-2006  ad Sync with head.
 1.119.8.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1210).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.119.2.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1210).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.122.2.3 24-Mar-2007  yamt sync with head.
 1.122.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.122.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.125.6.1 18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.125.4.1 11-Jul-2007  mjf Sync with head.
 1.125.2.6 09-Oct-2007  ad Sync with head.
 1.125.2.5 20-Aug-2007  ad Sync with HEAD.
 1.125.2.4 15-Jul-2007  ad Sync with head.
 1.125.2.3 09-Jun-2007  ad Sync with head.
 1.125.2.2 08-Jun-2007  ad Sync with head.
 1.125.2.1 10-Apr-2007  ad Sync with head.
 1.130.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.131.8.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.131.8.1 19-Jul-2007  dyoung file in6.c was added on branch matt-mips64 on 2007-07-19 20:48:57 +0000
 1.131.6.3 23-Mar-2008  matt sync with HEAD
 1.131.6.2 09-Jan-2008  matt sync with HEAD
 1.131.6.1 06-Nov-2007  matt sync with HEAD
 1.131.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.131.4.3 11-Nov-2007  joerg Sync with HEAD.
 1.131.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.131.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.133.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.134.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.134.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.135.2.1 08-Dec-2007  ad Sync with head.
 1.139.12.3 17-Jan-2009  mjf Sync with HEAD.
 1.139.12.2 28-Sep-2008  mjf Sync with HEAD.
 1.139.12.1 03-Apr-2008  mjf Sync with HEAD.
 1.139.8.2 24-Mar-2008  keiichi sync with head.
 1.139.8.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.140.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.140.10.1 19-Oct-2008  haad Sync with HEAD.
 1.140.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.140.4.6 11-Aug-2010  yamt sync with head.
 1.140.4.5 11-Mar-2010  yamt sync with head
 1.140.4.4 16-Sep-2009  yamt sync with head
 1.140.4.3 19-Aug-2009  yamt sync with head.
 1.140.4.2 16-May-2009  yamt sync with head
 1.140.4.1 04-May-2009  yamt sync with head.
 1.141.14.1 18-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1876):
sys/netinet6/in6.c: revision 1.167 via patch
Include BRDADDR and NETMASK to the v4 ioctls we ban for v6; from FreeBSD.
 1.141.10.1 18-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1876):
sys/netinet6/in6.c: revision 1.167 via patch
Include BRDADDR and NETMASK to the v4 ioctls we ban for v6; from FreeBSD.
 1.141.4.1 18-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1876):
sys/netinet6/in6.c: revision 1.167 via patch
Include BRDADDR and NETMASK to the v4 ioctls we ban for v6; from FreeBSD.
 1.141.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.141.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.141.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.146.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.154.4.2 05-Mar-2011  rmind sync with head
 1.154.4.1 30-May-2010  rmind sync with head
 1.154.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.156.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.156.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.158.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.158.2.2 30-Oct-2012  yamt sync with head
 1.158.2.1 17-Apr-2012  yamt sync with head
 1.159.10.2 18-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #944):
sys/netinet6/in6.c: revision 1.167 via patch
Include BRDADDR and NETMASK to the v4 ioctls we ban for v6; from FreeBSD.
 1.159.10.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.159.8.2 18-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #944):
sys/netinet6/in6.c: revision 1.167 via patch
Include BRDADDR and NETMASK to the v4 ioctls we ban for v6; from FreeBSD.
 1.159.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.159.4.2 18-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #944):
sys/netinet6/in6.c: revision 1.167 via patch
Include BRDADDR and NETMASK to the v4 ioctls we ban for v6; from FreeBSD.
 1.159.4.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.159.2.1 05-Apr-2012  mrg sync to latest -current.
 1.161.2.3 03-Dec-2017  jdolecek update from HEAD
 1.161.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.161.2.1 23-Jun-2013  tls resync from head
 1.165.2.3 18-May-2014  rmind sync with head
 1.165.2.2 28-Aug-2013  rmind sync with head
 1.165.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.169.2.1 10-Aug-2014  tls Rebase.
 1.174.2.2 06-Apr-2015  snj Pull up following revision(s) (requested by martin in ticket #655):
sys/netinet6/in6.c: revision 1.182 via patch
sys/netinet6/in6_ifattach.c: revision 1.95 via patch
sys/netinet6/nd6.c: revision 1.158 via patch
sys/netinet6/nd6.h: revision 1.62 via patch
sys/netinet6/nd6_nbr.c: revision 1.104 via patch
sys/netinet6/nd6_rtr.c: revision 1.96 via patch
Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.174.2.1 27-Oct-2014  martin Pull up following revision(s) (requested by roy in ticket #160):
sbin/ifconfig/af_inet6.c: revision 1.30
sbin/ifconfig/ifconfig.8: revision 1.109
sys/netinet6/in6.c: revision 1.177
Remove the ability for userland to toggle IN6_IFF_TENTATIVE.
Preserve IN6_IFF_TENTATIVE when updating address flags.
 1.179.2.12 28-Aug-2017  skrll Sync with HEAD
 1.179.2.11 05-Feb-2017  skrll Sync with HEAD
 1.179.2.10 05-Dec-2016  skrll Sync with HEAD
 1.179.2.9 05-Oct-2016  skrll Sync with HEAD
 1.179.2.8 09-Jul-2016  skrll Sync with HEAD
 1.179.2.7 29-May-2016  skrll Sync with HEAD
 1.179.2.6 22-Apr-2016  skrll Sync with HEAD
 1.179.2.5 19-Mar-2016  skrll Sync with HEAD
 1.179.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.179.2.3 22-Sep-2015  skrll Sync with HEAD
 1.179.2.2 06-Jun-2015  skrll Sync with HEAD
 1.179.2.1 06-Apr-2015  skrll Sync with HEAD
 1.208.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.208.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.208.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.208.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.208.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.233.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.244.4.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.245.2.15 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1944):

sys/netinet6/in6.c: revision 1.292

netinet6: Avoid NPD on `ifconfig ifN inet6 ... pltime 0 vltime 0'.
PR kern/53922
 1.245.2.14 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1884):

sys/netinet6/in6.c: revision 1.289
sys/netinet6/ip6_output.c: revision 1.234

in6: clear ND6_IFF_IFDISABLED to allow DAD again on link-up

in6: don't send any IPv6 packets over a disabled interface
 1.245.2.13 08-Oct-2020  martin Pull up following revision(s) (requested by roy in ticket #1613):

sys/netinet/in.c: revision 1.241
sys/netinet6/in6.c: revision 1.282

inet: Treat LINK_STATE_UNKNOWN as LINK_STATE_UP when changing

It's something we have always done.
it's really rare for anything to transition to UNKNOWN from either
UP or DOWN, but technically it is possible.
 1.245.2.12 06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1080):

sys/netinet6/nd6.c: revision 1.251
sys/netinet/if_arp.c: revision 1.276
sys/net/if.c: revision 1.438
sys/net/if.c: revision 1.439
sys/net/route.c: revision 1.214
sys/net/route.c: revision 1.215
sys/net/route.c: revision 1.216
sys/netinet6/in6.c: revision 1.270
sys/net/route.h: revision 1.120
sys/net/if.c: revision 1.440

Remove a wrong assertion in ifaref

-

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.

-

Use atomic operations for ifa_refcnt

-

Avoid a dangling pointer during rt_replace_ifa

-

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.

-

Use rt_update framework on updating a rtentry
 1.245.2.11 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #842):

sys/netinet6/mld6.c: revision 1.93-1.99
sys/netinet6/in6_var.h: revision 1.99,1.100
sys/netinet6/in6.c: revision 1.267,1.268
sys/netinet6/nd6.c: revision 1.249

Don't hold softnet_lock in mld_timeo
Then we can get rid of remaining abuses of mutex_owned(softnet_lock).

Release in6_multilock on callout_halt of mld_timeo to avoid a deadlock
Improve atomicity of in6_leavegroup and in6_delmulti

Avoid NULL pointer dereference on imm->i6mm_maddr

Make a refcount decrement and a removal from a list of an item atomic
in6m_refcount of an in6m can be incremented if the in6m is on the list
(if_multiaddrs) in in6_addmulti or mld_input. So we must avoid such an
increment when we try to destroy an in6m. To this end we must make
an in6m_refcount decrement and a removal of an in6m from if_multiaddrs
atomic.

Make a deletion of in6m in nd6_rtrequest atomic

Move LIST_REMOVE
mld_stoptimer releases in6_multilock temporarily, so we must LIST_REMOVE first.

Avoid double LIST_REMOVE which corrupts lists
Mark in6m as used for non-DIAGNOSTIC builds.
 1.245.2.10 08-Apr-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #701):
sys/netinet/in.c: 1.227
sys/netinet6/in6.c: 1.265
tests/net/arp/t_arp.sh: 1.35-1.36
Make GARP work again when DAD is disabled
The change avoids setting an IP address tentative on initializing it when
the IPv4 DAD is disabled (net.inet.ip.dad_count=0), which allows a GARP packet
to be sent (see arpannounce). This is the same behavior of NetBSD 7, i.e.,
before introducing the IPv4 DAD.
Additionally do the same change to IPv6 DAD for consistency.
The change is suggested by roy@
--
Improve packet checks and error reporting
--
Add tests for GARP without DAD
Additionally make the existing tests for GARP more explicit.
 1.245.2.9 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.245.2.8 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #588):
sys/netinet6/in6.c: revision 1.260
sys/netinet/in.c: revision 1.219
sys/netinet/wqinput.c: revision 1.4
sys/rump/net/lib/libnetinet/netinet_component.c: revision 1.11
sys/netinet/ip_input.c: revision 1.376
sys/netinet6/ip6_input.c: revision 1.193
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.245.2.7 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.245.2.6 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.245.2.5 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #463):
sys/netinet/in.c: revision 1.212
sys/netinet/ip_output.c: revision 1.288
sys/netinet6/in6.c: revision 1.256
sys/netinet6/in6_pcb.c: revision 1.163
sys/sys/lwp.h: revision 1.176
Add missing curlwp_bindx
--
Add missing curlwp_bindx
--
Check LP_BOUND is surely set in curlwp_bindx
This may find an extra call of curlwp_bindx.
--
Fix usage of curlwp_bind in ip_output
curlwp_bindx must be called in LIFO order, i.e., we can't call curlwp_bind
and curlwp_bindx like this:
bound1 = curlwp_bind();
bound2 = curlwp_bind();
curlwp_bindx(bound1);
curlwp_bindx(bound2);
ip_outout did so if NET_MPSAFE. Fix it.
--
Fix wrong usage of psref_held
We can't use it for checking if a caller does NOT hold a given target.
If you want to do it you should have psref_not_held or something.
 1.245.2.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.245.2.3 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #407):
sys/compat/linux32/common/linux32_socket.c: revision 1.28
sys/net/if.c: revision 1.400
sys/netipsec/key.c: revision 1.243
sys/compat/linux/common/linux_socket.c: revision 1.139
sys/netinet/ip_carp.c: revision 1.93
sys/netinet6/in6.c: revision 1.252
sys/netinet6/in6.c: revision 1.253
sys/netinet6/in6.c: revision 1.254
sys/net/if_spppsubr.c: revision 1.173
sys/net/if_spppsubr.c: revision 1.174
sys/compat/common/uipc_syscalls_40.c: revision 1.14
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Fix usage of FOREACH macro
key_sad.lock is held there so SAVLIST_WRITER_FOREACH is enough.
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref (more)
Fix and make consistent of usages of psz/psref in ifconf variants
Remove unnecessary goto because there is no cleanup code to share (NFC)
Tweak a condition; we don't need to care ifacount to be negative
Fix a race condition of in6_ifinit
in6_ifinit checks the number of IPv6 addresses on a given interface and
if it's zero (i.e., an IPv6 address being assigned to the interface
is the first one), call if_addr_init. However, the actual assignment of
the address (ifa_insert) is out of in6_ifinit. The check and the
assignment must be done atomically.
Fix it by holding in6_ifaddr_lock during in6_ifinit and ifa_insert.
And also add missing pserialize to IFADDR_READER_FOREACH.
 1.245.2.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #353):
sys/net/if_llatbl.c: 1.22
sys/net/if_llatbl.h: 1.13
sys/netinet/if_arp.c: 1.254
sys/netinet/in.c: 1.208-1.209
sys/netinet6/in6.c: 1.249-1.250
sys/netinet6/nd6.c: 1.237
Remove redundant KASSERTMSG
The function is static, has just one caller and the caller does the same check.
--
Fix a deadlock between a route update and lltable
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.245.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.260.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.260.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.260.2.5 28-Jul-2018  pgoyette Sync with HEAD
 1.260.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.260.2.3 02-May-2018  pgoyette Synch with HEAD
 1.260.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.260.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.268.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.268.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.268.2.1 10-Jun-2019  christos Sync with HEAD
 1.275.2.3 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1812):

sys/netinet6/in6.c: revision 1.292

netinet6: Avoid NPD on `ifconfig ifN inet6 ... pltime 0 vltime 0'.
PR kern/53922
 1.275.2.2 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1707):

sys/netinet6/in6.c: revision 1.289
sys/netinet6/ip6_output.c: revision 1.234

in6: clear ND6_IFF_IFDISABLED to allow DAD again on link-up

in6: don't send any IPv6 packets over a disabled interface
 1.275.2.1 08-Oct-2020  martin Pull up following revision(s) (requested by roy in ticket #1104):

sys/netinet/in.c: revision 1.241
sys/netinet6/in6.c: revision 1.282

inet: Treat LINK_STATE_UNKNOWN as LINK_STATE_UP when changing

It's something we have always done.
it's really rare for anything to transition to UNKNOWN from either
UP or DOWN, but technically it is possible.
 1.276.2.1 25-Jan-2020  ad Sync with head.
 1.288.2.4 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.288.2.3 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #619):

sys/netinet6/in6.c: revision 1.292

netinet6: Avoid NPD on `ifconfig ifN inet6 ... pltime 0 vltime 0'.
PR kern/53922
 1.288.2.2 10-Dec-2023  martin Pull up following revision(s) (requested by pgoyette in ticket #487):

sys/compat/common/compat_90_mod.c: revision 1.5
sys/compat/common/compat_90_mod.c: revision 1.6
sys/netinet6/in6.c: revision 1.290
sys/netinet6/in6.c: revision 1.291
sys/compat/common/files.common: revision 1.11
sys/netinet6/icmp6.c: revision 1.255
sys/compat/common/net_inet6_nd_90.c: revision 1.1
sys/compat/common/net_inet6_nd_90.c: revision 1.2
sys/modules/compat_90/Makefile: revision 1.2
sys/modules/compat_90/Makefile: revision 1.3
sys/netinet6/nd6.c: revision 1.281
sys/compat/common/compat_mod.h: revision 1.10
sys/kern/compat_stub.c: revision 1.23
sys/sys/compat_stub.h: revision 1.27

Identify the need to rework the COMPAT_* code to be more
module-aware.
This is an XXX comment block only, NFCI.

Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

Allow kernels builds which don't define INET6 to compile compat bits
too.

Default the build of compat_90 module to include IPv6, as is done
for other INET6-sensitive modules (see if_lagg).
 1.288.2.1 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #310):

sys/netinet6/in6.c: revision 1.289
sys/netinet6/ip6_output.c: revision 1.234

in6: clear ND6_IFF_IFDISABLED to allow DAD again on link-up

in6: don't send any IPv6 packets over a disabled interface
 1.292.2.1 02-Aug-2025  perseant Sync with HEAD
 1.101 31-Jul-2021  andvar fix typos in comments
 1.100 08-Sep-2020  christos branches: 1.100.6;
Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.99 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.98 01-Nov-2019  knakahara Make global and per-interface ipsecif(4) pmtu tunable like gif(4).

And make hop limit tunable same as gif(4).

See http://mail-index.netbsd.org/source-changes/2019/10/30/msg110426.html
 1.97 30-Oct-2019  knakahara Add sysctl nodes to control fragmentation with IPv[46] over IPv6 gif(4).

New sysctl node "net.inet6.ip6.gifpmtu" means
- 0 (default)
Fragment by IPV6_MMTU. All packets reach the destination certainly,
however the long packet performance is poor.
This is same behavior as before.
- 1
Fragment by outer interface's MTU. The long packet performance would
be good, however the packets may be dropped in some network paths
whose path MTU less than the interface's MTU.
- others
undefined yet

New sysctl node "net.interfaces.gif*.pmtu" means
- -1 (default)
Use system default value (net.inet6.ip6.gifpmtu).
- 0
Fragment by IPV6_MMTU for this gif(4) tunnel.
- 1
Fragment by outer interface's MTU for this gif(4) tunnel.
- others
undefined yet

See RFC4459 for more information and other solutions.
 1.96 05-Sep-2019  kamil Revert regression introduced in in6.h r. 1.95
 1.95 28-May-2019  kamil Decorate struct in6_addr with the __packed attribute

This avoids undefined behavior when accessing misaligned pointers.

Detected by kUBSan.

Patch by Akul Pillai.
 1.95 28-May-2019  kamil branches: 1.95.2;
Decorate struct in6_addr with the __packed attribute

This avoids undefined behavior when accessing misaligned pointers.

Detected by kUBSan.

Patch by Akul Pillai.
 1.94 10-Dec-2018  christos need <sys/endian.h> (or arpa/inet.h) for ntohl() used in macros.
 1.93 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.92 10-Aug-2018  maxv Rename

ip6_undefer_csum -> in6_undefer_cksum
in6_delayed_cksum -> in6_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in6_offload.c. Add comments to explain what
we're doing.

Same as IPv4.
 1.91 19-Apr-2018  christos branches: 1.91.2;
s/static inline/static __inline/g for consistency.
 1.90 09-Feb-2018  maxv branches: 1.90.2;
Remove dead code.
 1.89 30-Jan-2018  maxv Style, localify, remove dead code, and fix typos. No functional change.
 1.88 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.87 15-Feb-2016  rtr branches: 1.87.10;
Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.86 12-Dec-2015  christos Hook up the addrctl stuff that's already there.
 1.85 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.84 22-Apr-2015  roy Move INET6 specific in6_if_{up,down}() and in6_if_link_{up,down}()
into agnostic domain functions.
 1.83 20-Feb-2015  rjs Declare input argument to in6_sin_2_v4mapsin6 to be const, allows an
address from the route cache to be used as the input.

ok christos@.
 1.82 20-Jan-2015  roy Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.81 02-Dec-2014  christos use the new printing code.
 1.80 02-Dec-2014  christos add routines to print in6_addr and sockaddr_in6 (in6_print, sin6_print)
 1.79 12-Oct-2014  christos branches: 1.79.2;
document that we depend on the option numbers matching.
 1.78 05-Jun-2014  rmind branches: 1.78.2;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.77 05-Jun-2014  roy Add IPV6CTL_AUTO_LINKLOCAL and ND6_IFF_AUTO_LINKLOCAL toggles which
control the automatic creation of IPv6 link-local addresses when an
interface is brought up.

Taken from FreeBSD.
 1.76 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.75 19-Oct-2013  christos branches: 1.75.2;
define constants for scopeid function flags.
 1.74 19-Oct-2013  christos add scopeid functions
 1.73 20-Jun-2013  roy branches: 1.73.2;
Move the detaching and making tentative addresses out if in6_if_up
and into in6_if_link_up.

This fixes a possible panic where link is up but not the interface.
Note that a better solution would be to listen to the routing socket
in the kernel, but I don't know how to do that.

Reachable Router tests for IFF_UP as well.
 1.72 11-Jun-2013  roy When an interface link state changes to down, mark all attached IPv6
addresses as detached.
Likewise, when the link state changes to up, mark all detached IPv6
as tentative and start DAD on them.

Advertised router reachability now checks that link state is not down.
This means that when an interface link state changes, the default IPv6
router may change as well.
 1.71 27-Apr-2013  joerg Systematically include sys/featuretest.h when _NETBSD_SOURCE is used.
Some are redundant, but make verification with grep much easier.
 1.70 22-Jun-2012  christos branches: 1.70.2;
PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.69 24-May-2011  spz branches: 1.69.4;
RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.68 11-Sep-2009  dyoung branches: 1.68.4; 1.68.6;
Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.67 19-Aug-2009  seanb - Newer gcc was throwning a 'dereferencing type-punned pointer will
break strict-aliasing rules' warning against IN6_IS_ADDR_* macros
at -O2 -Wall.
 1.66 25-Dec-2007  perry branches: 1.66.2; 1.66.10;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.65 01-Nov-2007  dyoung branches: 1.65.2; 1.65.4; 1.65.8;
De-__P().
 1.64 24-Oct-2007  dyoung Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.63 07-Oct-2007  joerg branches: 1.63.2;
NetBSD doesn't have to care about missing bcmp on OpenBSD/SPARC,
just use memcmp in both kernel and userland.
 1.62 30-Aug-2007  dyoung branches: 1.62.2;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.61 28-Jun-2007  christos branches: 1.61.2; 1.61.6; 1.61.8;
Add functions to do mapped address conversions from FreeBSD.
 1.60 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.59 17-Feb-2007  dyoung branches: 1.59.4; 1.59.6;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.58 15-Feb-2007  seanb branches: 1.58.2;
Typo in comment.
 1.57 31-Oct-2006  cbiere Commented out IPv6 socket options which are no longer supported.
 1.56 07-Jun-2006  kardel branches: 1.56.6; 1.56.8;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.55 07-May-2006  rpaulo branches: 1.55.2;
Use C99 uintXX_t types so that applications don't need to include
sys/types.h directly (as in the past).
 1.54 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.53 29-Mar-2006  dyoung Add predicate IN6_IS_SCOPE_EMBEDDABLE(__a), which is true if and
only if the address __a is the type in which the IPv6 stack embeds
scope information.
 1.52 16-Feb-2006  perry branches: 1.52.2; 1.52.4; 1.52.6;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.51 21-Jan-2006  rpaulo branches: 1.51.2; 1.51.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.50 24-Dec-2005  perry branches: 1.50.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.49 20-Dec-2005  christos Forward declarations for structs.
 1.48 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.47 28-Aug-2005  rpaulo Implement net.inet6.ip6.stats sysctl.

Reviewed by Elad Efrat.
 1.46 10-Aug-2005  yamt ipv6 tx checksum offloading. reviewed by Jason Thorpe.
 1.45 11-Jun-2004  itojun branches: 1.45.12;
implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.44 12-Nov-2003  itojun branches: 1.44.2;
implement net.inet6.ifq
 1.43 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42 28-Apr-2003  bjh21 branches: 1.42.2;
Add a new feature-test macro, _NETBSD_SOURCE. If this is defined
by the application, all NetBSD interfaces are made visible, even
if some other feature-test macro (like _POSIX_C_SOURCE) is defined.
<sys/featuretest.h> defined _NETBSD_SOURCE if none of _ANSI_SOURCE,
_POSIX_C_SOURCE and _XOPEN_SOURCE is defined, so as to preserve
existing behaviour.

This has two major advantages:
+ Programs that require non-POSIX facilities but define _POSIX_C_SOURCE
can trivially be overruled by putting -D_NETBSD_SOURCE in their CFLAGS.
+ It makes most of the #ifs simpler, in that they're all now ORs of the
various macros, rather than having checks for (!defined(_ANSI_SOURCE) ||
!defined(_POSIX_C_SOURCE) || !defined(_XOPEN_SOURCE)) all over the place.

I've tried not to change the semantics of the headers in any case where
_NETBSD_SOURCE wasn't defined, but there were some places where the
current semantics were clearly mad, and retaining them was harder than
correcting them. In particular, I've mostly normalised things so that
_ANSI_SOURCE gets you the smallest set of stuff, then _POSIX_C_SOURCE,
_XOPEN_SOURCE and _NETBSD_SOURCE in that order.

Tested by building for vax, encouraged by thorpej, and uncontested in
tech-userlevel for a week.
 1.41 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.40 28-May-2002  itojun limit number of IPv6 fragments (not the fragment queue size) to
fight against lots-of-frags DoS attacks. sync w/kame
 1.39 14-May-2002  itojun branches: 1.39.2; 1.39.4;
rename: net.inet6.ip6.bindv6only -> net.inet6.ip6.v6only
sync w/kame.
 1.38 13-May-2002  kleink * Use uint{8,32}_t from <netinet/in.h> where applicable; use private
fixed-width integer types otherwise.
* Protect RFC 2292 prototypes, which are not XNS5.2/POSIX-2001; also, define
size_t for inet6_rthdr_space().
 1.37 13-May-2002  kleink IPV6PORT_* aren't in the reserved namespace either.
 1.36 13-May-2002  kleink Check _POSIX_C_SOURCE as well.
 1.35 13-May-2002  kleink Update two comments.
 1.34 12-May-2002  kleink Provide local definitions of in_{addr,port}_t in <netinet/in.h> and use
them where deemed appropriate by XNS5.2/POSIX-2001.
 1.33 21-Dec-2001  itojun whitespace/costmetic sync w/kame
 1.32 17-Nov-2001  perry (minor) delint
 1.31 24-Oct-2001  itojun more whitespace sync with kame
 1.30 18-Oct-2001  itojun branches: 1.30.2;
reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.29 16-Oct-2001  itojun reduce diff with kame. whitespace only
 1.28 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.27 24-Jul-2001  itojun fix comment on setsockopt arg size. KAME PR 369
 1.26 02-Jun-2001  thorpej branches: 1.26.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.25 30-Mar-2001  itojun fix constness of IN6_{IS,ARE}_xx with RFC2553. sync with kame.
 1.24 02-Mar-2001  itojun branches: 1.24.2;
have comment that refers to kame COVERAGE document. sync with kame
 1.23 02-Mar-2001  itojun the date string in KAME version is getting very meaningless, remove.
 1.22 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.21 17-Oct-2000  itojun use __P() in prototype for non-ansi compilers.
From: Michael Shalayeff <mickey@lucifier.remote.dti.net>
(we don't ansify it for kame code sharing)
 1.20 27-Aug-2000  itojun add a warning on IPv6 setsockopt number space (*BSD shares the number space
so consult KAME for number allocation)
 1.19 26-Aug-2000  itojun implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
 1.18 16-Jul-2000  itojun do not pull sys/queue.h from netinet6/in6.h. PR10597.
some sync with kame.
 1.17 06-Jul-2000  christos elide lint cast type conversion warnings.
 1.16 26-Jun-2000  kleink XNS5.2: define sa_family_t and use it where specified by the standard.
 1.15 08-Jun-2000  danw branches: 1.15.2;
Use _POSIX_SOURCE-safe type names
 1.14 24-May-2000  itojun branches: 1.14.2;
enforce parameter type check for IN6_ARE_ADDR_EQUAL(). (sync with kame)
 1.13 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.12 19-Feb-2000  itojun s/u_char/u_int8_t/ for sin6_{family,len}
 1.11 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.10 09-Feb-2000  itojun to improve RFC2553/2292 compliance, and promote use of
RFC2553/2292-compliant header file path, now the following headers are
forbidden:
netinet6/ip6.h
netinet6/icmp6.h
netinet6/in6.h

if you want netinet6/{ip6,icmp6}.h, use netinet/{ip6,icmp6}.h.

if you want netinet6/in6.h, you just need to include netinet/in.h.
it pulls it in.
(we may need to integrate them into netinet/in.h, but for cross-BSD code
sharing i'd like to keep it like this for now)
 1.9 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.8 06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.7 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.6 06-Jul-1999  itojun branches: 1.6.2; 1.6.8;
sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.5 03-Jul-1999  thorpej RCS ID police.
 1.4 03-Jul-1999  kleink Take a stab at namespace protection. For now, only the obvious problems are
addressed, the culprit being the lack of a namespace definition for an IPv6-
extended <netinet/in.h> in XNS5.2 D2.0; I'll try to work something out and
submit it to the review WG.
 1.3 02-Jul-1999  itojun move ipsec sysctl index to IPPROTO_AH (instead of IPPROTO_ESP),
so that you can perform sysctl operation when ESP is not compiled in.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6.h was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6.h was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.6.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.6.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.6.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.6.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.15.2.4 18-Oct-2000  tv Pullup 1.21 [itojun]:
use __P() in prototype for non-ansi compilers.
From: Michael Shalayeff <mickey@lucifier.remote.dti.net>
(we don't ansify it for kame code sharing)
 1.15.2.3 27-Aug-2000  itojun pullup 1.19 -> 1.20 (approved by releng-1-5)

> add a warning on IPv6 setsockopt number space (*BSD shares the number space
> so consult KAME for number allocation)
 1.15.2.2 27-Aug-2000  itojun pullup (approved by releng-1-5)

> implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.

> cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3
> cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8
> cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h
> cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c
> cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c
> cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c
> cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
 1.15.2.1 16-Jul-2000  itojun pullup 1.17 -> 1.18 (approved by releng-1-5)
do not pull sys/queue.h from netinet6/in6.h. PR10597.
some sync with kame.
 1.24.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.24.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.24.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.24.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.24.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.24.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.24.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.26.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.26.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.26.2.1 03-Aug-2001  lukem update to -current
 1.30.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.39.4.1 14-Jun-2004  jmc Pullup rev 1.45 (requested by itojun in ticket #1709)

Implement IPV6_USE_MIN_MTU sockopt.
 1.39.2.2 20-Jun-2002  gehenna catch up with -current.
 1.39.2.1 30-May-2002  gehenna Catch up with -current.
 1.42.2.5 11-Dec-2005  christos Sync with head.
 1.42.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.42.2.1 03-Aug-2004  skrll Sync with HEAD
 1.44.2.1 14-Jun-2004  tron Pull up revision 1.45 (requested by itojun in ticket #468):
implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.45.12.7 21-Jan-2008  yamt sync with head
 1.45.12.6 15-Nov-2007  yamt sync with head.
 1.45.12.5 27-Oct-2007  yamt sync with head.
 1.45.12.4 03-Sep-2007  yamt sync with head.
 1.45.12.3 26-Feb-2007  yamt sync with head.
 1.45.12.2 30-Dec-2006  yamt sync with head.
 1.45.12.1 21-Jun-2006  yamt sync with head.
 1.50.2.2 18-Feb-2006  yamt sync with head.
 1.50.2.1 01-Feb-2006  yamt sync with head.
 1.51.4.3 01-Jun-2006  kardel Sync with head.
 1.51.4.2 22-Apr-2006  simonb Sync with head.
 1.51.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.51.2.1 09-Sep-2006  rpaulo sync with head
 1.52.6.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.52.6.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.52.4.2 11-May-2006  elad sync with head
 1.52.4.1 19-Apr-2006  elad sync with head.
 1.52.2.3 26-Jun-2006  yamt sync with head.
 1.52.2.2 24-May-2006  yamt sync with head.
 1.52.2.1 01-Apr-2006  yamt sync with head.
 1.55.2.1 19-Jun-2006  chap Sync with head.
 1.56.8.1 10-Dec-2006  yamt sync with head.
 1.56.6.1 18-Nov-2006  ad Sync with head.
 1.58.2.2 07-May-2007  yamt sync with head.
 1.58.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.59.6.1 11-Jul-2007  mjf Sync with head.
 1.59.4.3 09-Oct-2007  ad Sync with head.
 1.59.4.2 15-Jul-2007  ad Sync with head.
 1.59.4.1 08-Jun-2007  ad Sync with head.
 1.61.8.2 09-Jan-2008  matt sync with HEAD
 1.61.8.1 06-Nov-2007  matt sync with HEAD
 1.61.6.3 04-Nov-2007  jmcneill Sync with HEAD.
 1.61.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.61.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.61.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.62.2.1 14-Oct-2007  yamt sync with head.
 1.63.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.65.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.65.4.1 26-Dec-2007  ad Sync with head.
 1.65.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.66.10.1 16-Sep-2009  yamt sync with head
 1.66.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.68.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.68.4.1 31-May-2011  rmind sync with head
 1.69.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.69.4.1 30-Oct-2012  yamt sync with head
 1.70.2.3 03-Dec-2017  jdolecek update from HEAD
 1.70.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.2.1 23-Jun-2013  tls resync from head
 1.73.2.1 18-May-2014  rmind sync with head
 1.75.2.1 10-Aug-2014  tls Rebase.
 1.78.2.1 23-Jan-2015  martin Pull up following revision(s) (requested by pettai in ticket #441):
sys/netinet6/ip6_var.h: revision 1.64
sys/netinet6/in6.h: revision 1.82
sys/netinet6/in6_src.c: revision 1.56
sys/netinet6/mld6.c: revision 1.62
sys/netinet6/ip6_input.c: revision 1.150
sys/netinet6/ip6_output.c: revision 1.161
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.
Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.79.2.5 19-Mar-2016  skrll Sync with HEAD
 1.79.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.79.2.3 22-Sep-2015  skrll Sync with HEAD
 1.79.2.2 06-Jun-2015  skrll Sync with HEAD
 1.79.2.1 06-Apr-2015  skrll Sync with HEAD
 1.87.10.1 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.90.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.90.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.90.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.91.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.91.2.1 10-Jun-2019  christos Sync with HEAD
 1.95.2.1 06-Sep-2019  martin Pull up following revision(s) (requested by kamil in ticket #183):

sys/netinet6/in6.h: revision 1.96

Revert regression introduced in in6.h r. 1.95
 1.95 28-May-2019  kamil Decorate struct in6_addr with the __packed attribute
This avoids undefined behavior when accessing misaligned pointers.
Detected by kUBSan.
Patch by Akul Pillai.
 1.100.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.28 25-Apr-2011  yamt fix assertions
 1.27 10-Mar-2008  yamt branches: 1.27.26; 1.27.32;
in6_cksum: use IN6_IS_SCOPE_EMBEDDABLE.
 1.26 10-Mar-2008  yamt in6_cksum: avoid using -> operator and use (char *) arithmetics instead.
reviewed by Joerg Sonnenberger. he pointed out that the original code
was written that way so that the compiler will explicitly not assume that
the alignment of the data is correct. although i don't know if it really
matters or not, being safer is not a problem.
 1.25 09-Mar-2008  yamt in6_cksum: constify
 1.24 12-Feb-2008  joerg branches: 1.24.2; 1.24.6;
Explicitly predict panic conditions as false.
 1.23 12-Feb-2008  joerg Provide a simplified inplace version of in6_cksum.
Tested by is@ on amd64.
 1.22 25-Jan-2008  joerg Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.21 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.20 23-May-2007  christos branches: 1.20.8; 1.20.14; 1.20.16; 1.20.20;
Ansify + add a few comments, from Karl Sjödahl
 1.19 27-Jan-2006  rpaulo branches: 1.19.28; 1.19.30;
PR 32653: mrt@notwork.org: remove 'sum += w[0]' left in previous revision.
 1.18 21-Jan-2006  rpaulo Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.17 11-Dec-2005  christos branches: 1.17.2;
merge ktrace-lwp.
 1.16 07-Aug-2003  agc branches: 1.16.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.15 18-Jul-2003  itojun remove #if 0 portion
 1.14 27-Sep-2002  provos branches: 1.14.6;
remove trailing \n in panic(). approved perry.
 1.13 09-Jun-2002  itojun whitespace cleanup
 1.12 13-Nov-2001  lukem branches: 1.12.8;
add RCSIDs
 1.11 30-May-2001  thorpej branches: 1.11.2;
Skip the pseudo-header if nxt == 0. This is already documented
in in6_cksum(9) and is also the behavior of the i386 optimized
version.
 1.10 10-Feb-2001  itojun branches: 1.10.2;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.9 09-Sep-2000  itojun move file static variable into auto variable, for better thread safety.
(not really required for big lock MP). sync with kame
 1.8 09-Sep-2000  itojun add attribute(packed).
From: Alfred Perlstein <bright@wintelcom.net>
 1.7 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 11-Jul-1999  itojun branches: 1.5.2; 1.5.8;
fix compilation/runtime problem on alpha.

PR: 7952, 7953
From: Dave Huang <khym@bga.com>
 1.4 06-Jul-1999  itojun checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_cksum.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_cksum.c was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.2.4 18-Oct-2002  nathanw Catch up to -current.
 1.10.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.10.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.11.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.11.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.11.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.12.8.1 20-Jun-2002  gehenna catch up with -current.
 1.14.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.14.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.14.6.1 03-Aug-2004  skrll Sync with HEAD
 1.16.16.6 17-Mar-2008  yamt sync with head.
 1.16.16.5 27-Feb-2008  yamt sync with head.
 1.16.16.4 04-Feb-2008  yamt sync with head.
 1.16.16.3 21-Jan-2008  yamt sync with head
 1.16.16.2 03-Sep-2007  yamt sync with head.
 1.16.16.1 21-Jun-2006  yamt sync with head.
 1.17.2.1 01-Feb-2006  yamt sync with head.
 1.19.30.1 11-Jul-2007  mjf Sync with head.
 1.19.28.1 08-Jun-2007  ad Sync with head.
 1.20.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.20.16.1 26-Dec-2007  ad Sync with head.
 1.20.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.20.8.2 23-Mar-2008  matt sync with HEAD
 1.20.8.1 09-Jan-2008  matt sync with HEAD
 1.24.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.24.2.1 24-Mar-2008  keiichi sync with head.
 1.27.32.1 06-Jun-2011  jruoho Sync with HEAD.
 1.27.26.1 31-May-2011  rmind sync with head
 1.96 07-Dec-2022  knakahara gif(4), ipsec(4) and l2tp(4) use encap_attach_addr().
 1.95 30-Oct-2019  knakahara Add sysctl nodes to control fragmentation with IPv[46] over IPv6 gif(4).

New sysctl node "net.inet6.ip6.gifpmtu" means
- 0 (default)
Fragment by IPV6_MMTU. All packets reach the destination certainly,
however the long packet performance is poor.
This is same behavior as before.
- 1
Fragment by outer interface's MTU. The long packet performance would
be good, however the packets may be dropped in some network paths
whose path MTU less than the interface's MTU.
- others
undefined yet

New sysctl node "net.interfaces.gif*.pmtu" means
- -1 (default)
Use system default value (net.inet6.ip6.gifpmtu).
- 0
Fragment by IPV6_MMTU for this gif(4) tunnel.
- 1
Fragment by outer interface's MTU for this gif(4) tunnel.
- others
undefined yet

See RFC4459 for more information and other solutions.
 1.94 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.93 01-May-2018  maxv branches: 1.93.2; 1.93.6;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.92 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.91 14-Mar-2018  knakahara Fix error checking in in6_gif_ctlinput().

if_gif.c:r1.133 introduces gif_update_variant() which ensure ifp->if_flags
is set IFF_RUNNING when gif_softc->gif_var->gv_{psrc,pdst} are not null.
So, in6_gif_ctlinput() is not required IFF_RUNNING checking. In contrast,
it is required gv_{psrc,pdst} NULL checking.
 1.90 10-Jan-2018  knakahara branches: 1.90.2;
apply in{,6}_tunnel_validate() to gif(4).
 1.89 27-Nov-2017  knakahara IFF_RUNNING checking in Rx and Tx processing is unnecessary now.

Because the configs of gif (members of gif_var) are protected by psref(9).
 1.88 27-Nov-2017  knakahara preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).

After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).

update locking notes later.
 1.87 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.86 21-Sep-2017  knakahara add lock for percpu route like l2tp(4).
 1.85 16-Jan-2017  christos branches: 1.85.6;
ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.84 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.83 06-Jan-2017  knakahara branches: 1.83.2;
remove unnecessary conversion.

gif_softc->gif_pdst is already valid sockaddr.
 1.82 14-Dec-2016  knakahara fix race of gif_softc->gif_ro when we send multiple flows over gif on NET_MPSAFE enabled kernel.

make gif_softc->gif_ro percpu as well as ipforward_rt to resolve this race.
and add future TODO comment for etherip(4).
 1.81 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.80 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.79 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.78 06-Jul-2016  ozaki-r branches: 1.78.2;
Apply m_get_rcvif_psref (kill m_get_rcvif_NOMPSAFE)
 1.77 04-Jul-2016  knakahara fix: gif(4) receive side race

A panic cause in rn_match() called by encap[46]_lookup(). The reason is that
gif(4) does not suspend receive packet processing in spite of suspending
transmit packet processing while anyone is doing gif(4) ioctl.
 1.76 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (1/2) : gif(4) side

To prevent calling softint_schedule() after called softint_disestablish(),
the following modifications are added
+ ioctl (writing configuration) side
- off IFF_RUNNING flag before changing configuration
- wait softint handler completion before changing configuration
+ packet processing (reading configuraiotn) side
- if IFF_RUNNING flag is on, do nothing
+ in whole
- add gif_list_lock_{enter,exit} to prevent the same configuration is
set to other gif(4) interfaces
 1.75 28-Jun-2016  ozaki-r Add missing NULL checks for m_get_rcvif_psref
 1.74 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.73 29-Feb-2016  knakahara remove unnecessary declarations and fix KNF

Thanks to riastradh@
 1.72 26-Feb-2016  knakahara To eliminate gif_softc_list linear search, add extra argument to encapsw.pr_ctlinput().
 1.71 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.70 23-Jan-2016  riastradh Those were local changes not meant to be part of the revert. SORRY!
 1.69 23-Jan-2016  christos make this compile again
 1.68 22-Jan-2016  riastradh Back out previous change to introduce struct encapsw.

This change was intended, but Nakahara-san had already made a better
one locally! So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
 1.67 22-Jan-2016  riastradh Don't abuse struct protosw for ip_encap -- introduce struct encapsw.

Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
 1.66 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.65 18-Jan-2016  knakahara Refactor protosw codes in gif(4). No functional change.

- remove unnecessary include
- reduce scopes
 1.64 25-Dec-2015  knakahara use satosin{,6} macros instead of casts.
 1.63 11-Dec-2015  knakahara PR kern/50522: gif(4) ioctl causes panic while someone is using the gif(4) interface.

It is required to wait other CPU's softint completion before disestablishing
the softint handler.
 1.62 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.61 24-Apr-2015  ozaki-r Add missing rtcache_free

It's the same as other similar code paths in in_gif and ip6_etherip.
 1.60 18-May-2014  rmind branches: 1.60.4;
Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.59 01-Mar-2013  joerg branches: 1.59.6; 1.59.10;
Retire OSI network stack. OK core@
 1.58 14-Mar-2009  dsl branches: 1.58.12; 1.58.22;
Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.57 07-Nov-2008  dyoung branches: 1.57.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.56 24-Apr-2008  ad branches: 1.56.2; 1.56.8; 1.56.10;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.55 15-Apr-2008  thorpej branches: 1.55.2;
Make ip6 and icmp6 stats per-cpu.
 1.54 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.53 20-Dec-2007  dyoung branches: 1.53.6;
Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.52 23-May-2007  christos branches: 1.52.8; 1.52.16; 1.52.20;
Ansify + add a few comments, from Karl Sjödahl
 1.51 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.50 04-Mar-2007  christos branches: 1.50.2; 1.50.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.49 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.48 17-Feb-2007  dyoung branches: 1.48.2;
Don't open-code LIST_FOREACH().
 1.47 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.46 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.45 07-Jun-2006  kardel branches: 1.45.6; 1.45.8;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.44 11-Dec-2005  christos branches: 1.44.4; 1.44.6; 1.44.8; 1.44.14;
merge ktrace-lwp.
 1.43 26-Jun-2005  mlelstv branches: 1.43.2;
expire cached route. Fixes PR 22792.
 1.42 02-Jun-2005  tron Change the first argument of the encapsulation check function from
"const struct mbuf *" to "struct mbuf *". Without this change the
actual implementation cannot even use m_copydata() on the mbuf chain
which is broken.
 1.41 02-Jun-2005  tron Remove type casts and lint directives which are now longer necessary
because the first argument of m_copydata() is "const struct mbuf *" now.
 1.40 29-May-2005  christos - avoid shadowed variables
- sprinkle const.
 1.39 26-Feb-2005  perry branches: 1.39.2;
nuke trailing whitespace
 1.38 22-Apr-2004  matt branches: 1.38.4; 1.38.6;
Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.37 30-Oct-2003  simonb branches: 1.37.4;
Remove some assigned-to but otherwise unused variables.
 1.36 05-Sep-2003  itojun u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.35 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.34 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.33 25-Nov-2002  thorpej branches: 1.33.6;
Avoid strict-alias warnings.
 1.32 11-Nov-2002  itojun make USE_ENCAPCHECK (in netinet*/*gif.c) to global option, GIF_ENCAPCHECK.
#ifdef out unneeded code when possible.
From: Krister Walfridsson <cato@df.lth.se>
 1.31 05-Nov-2002  itojun improve gif lookup performance, when there are many of those,
by using radix tree for lookups. tested by yshimizu@iij.
 1.30 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.29 09-Jun-2002  itojun whitespace cleanup
 1.28 08-Jun-2002  itojun whitespace cleanup
 1.27 21-Dec-2001  itojun branches: 1.27.8;
use radix table for inbound tunnel lookup (would increase performance
for machines with a lot of tunnels).
update route cache for IPvX-over-IPv6 tunnel on path MTU discovery.
snyc with kame
 1.26 21-Dec-2001  itojun move in6_gif_hlim decl to in6_gif.c. sync with kame
 1.25 21-Dec-2001  itojun move protosw fragment for gif/stf to their own source code.
reduce #ifdef in stf code. sync with kame
 1.24 20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.23 13-Nov-2001  lukem add RCSIDs
 1.22 16-Aug-2001  itojun gif interface now uses generic software interrupt
(on archs that support it). also, make gif ALTQ-capable on outgoing.
sync with kame, comments from thorpej.
 1.21 29-Jul-2001  itojun sync gif interface code with latest kame.
IFF_RUNNING is clearified. attach/detach logic is more clearner.
the old code mistakenly set IFF_UP by itself, now the behavior is gone.
 1.20 14-May-2001  itojun branches: 1.20.2;
drop multi destination mode (IFF_LINK0).
 1.19 10-May-2001  itojun correct ecn consideration on tunnel encap/decap. sync with kame.
 1.18 20-Feb-2001  itojun branches: 1.18.2;
add AF_ISO case to output. from chopps.
 1.17 20-Feb-2001  itojun ISO over IPv4/v6 by EON encapsulation. from chopps, sync with kame.
 1.16 11-Feb-2001  itojun remove #ifdef __FreeBSD__.
 1.15 22-Jan-2001  itojun make it possible to turn off ingress filter on gif/stf tunnel egress,
by using IFF_LINK2. (part of) PR 11163 from Ken Raeburn.
 1.14 19-Apr-2000  itojun branches: 1.14.4;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.13 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.12 07-Feb-2000  itojun s/DIAGNOSTIC/DEBUG/
 1.11 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.10 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.9 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.8 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.7 20-Aug-1999  itojun branches: 1.7.2; 1.7.8;
do not capture packets by gif, when gif interface is down.
 1.6 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.5 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_gif.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_gif.c was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.7.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.7.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.7.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.4.1 01-May-2001  he Pull up revision 1.15 (requested by itojun):
Make it possible to turn off ingress filter on gif/stf tunnel
egress by using IFF_LINK2. Fixes (part of) PR#11163.
 1.18.2.8 11-Dec-2002  thorpej Sync with HEAD.
 1.18.2.7 11-Nov-2002  nathanw Catch up to -current
 1.18.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.18.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.18.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.18.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.18.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.18.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.20.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.20.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.20.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.20.2.1 03-Aug-2001  lukem update to -current
 1.27.8.1 20-Jun-2002  gehenna catch up with -current.
 1.33.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.33.6.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.33.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.33.6.1 03-Aug-2004  skrll Sync with HEAD
 1.37.4.1 09-Jan-2006  tron Pull up following revision(s) (requested by mlelstv in ticket #10214):
sys/netinet6/in6_gif.c: revision 1.43
sys/netinet/in_gif.c: revision 1.45
sys/net/if_gif.h: revision 1.11
expire cached route. Fixes PR 22792.
 1.38.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.38.4.1 29-Apr-2005  kent sync with -current
 1.39.2.1 08-Jan-2006  riz Pull up following revision(s) (requested by mlelstv in ticket #1092):
sys/netinet6/in6_gif.c: revision 1.43
sys/netinet/in_gif.c: revision 1.45
sys/net/if_gif.h: revision 1.11
expire cached route. Fixes PR 22792.
 1.43.2.5 21-Jan-2008  yamt sync with head
 1.43.2.4 03-Sep-2007  yamt sync with head.
 1.43.2.3 26-Feb-2007  yamt sync with head.
 1.43.2.2 30-Dec-2006  yamt sync with head.
 1.43.2.1 21-Jun-2006  yamt sync with head.
 1.44.14.1 19-Jun-2006  chap Sync with head.
 1.44.8.1 26-Jun-2006  yamt sync with head.
 1.44.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.44.4.1 09-Sep-2006  rpaulo sync with head
 1.45.8.2 18-Dec-2006  yamt sync with head.
 1.45.8.1 10-Dec-2006  yamt sync with head.
 1.45.6.1 12-Jan-2007  ad Sync with head.
 1.48.2.4 07-May-2007  yamt sync with head.
 1.48.2.3 12-Mar-2007  rmind Sync with HEAD.
 1.48.2.2 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.48.2.1 17-Feb-2007  yamt file in6_gif.c was added on branch yamt-idlelwp on 2007-02-27 16:54:59 +0000
 1.50.4.1 11-Jul-2007  mjf Sync with head.
 1.50.2.1 08-Jun-2007  ad Sync with head.
 1.52.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.52.16.1 26-Dec-2007  ad Sync with head.
 1.52.8.1 09-Jan-2008  matt sync with HEAD
 1.53.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.53.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.55.2.1 18-May-2008  yamt sync with head.
 1.56.10.2 28-Apr-2009  skrll Sync with HEAD.
 1.56.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.56.8.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.56.2.1 04-May-2009  yamt sync with head.
 1.57.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.58.22.3 03-Dec-2017  jdolecek update from HEAD
 1.58.22.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.58.22.1 23-Jun-2013  tls resync from head
 1.58.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.59.10.1 10-Aug-2014  tls Rebase.
 1.59.6.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.60.4.7 05-Feb-2017  skrll Sync with HEAD
 1.60.4.6 05-Oct-2016  skrll Sync with HEAD
 1.60.4.5 09-Jul-2016  skrll Sync with HEAD
 1.60.4.4 19-Mar-2016  skrll Sync with HEAD
 1.60.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.60.4.2 22-Sep-2015  skrll Sync with HEAD
 1.60.4.1 06-Jun-2015  skrll Sync with HEAD
 1.78.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.78.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.78.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.83.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.85.6.7 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.85.6.6 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.85.6.5 15-Mar-2018  bouyer Pull up following revision(s) (requested by knakahara in ticket #632):
sys/netinet6/in6_gif.c: revision 1.91
Fix error checking in in6_gif_ctlinput().
if_gif.c:r1.133 introduces gif_update_variant() which ensure ifp->if_flags
is set IFF_RUNNING when gif_softc->gif_var->gv_{psrc,pdst} are not null.
So, in6_gif_ctlinput() is not required IFF_RUNNING checking. In contrast,
it is required gv_{psrc,pdst} NULL checking.
 1.85.6.4 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.85.6.3 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #462):
sys/net/if_gif.c: revision 1.133, 1.134, 1.137
sys/net/if_gif.h: revision 1.28-1.29
sys/netinet/in_gif.c: revision 1.90-1.91
sys/netinet/in_gif.h: revision 1.18
sys/netinet6/in6_gif.c: revision 1.88-1.89
sys/netinet6/in6_gif.h: revision 1.17
preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).
After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).
update locking notes later.
--
update gif(4) locking notes.
--
IFF_RUNNING checking in Rx and Tx processing is unnecessary now.
Because the configs of gif (members of gif_var) are protected by psref(9).
--
remove duplicated null ckeck
 1.85.6.2 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.85.6.1 24-Oct-2017  snj Pull up following revision(s) (requested by knahakara in ticket #303):
sys/net/if_gif.c: 1.129-1.130
sys/net/if_gif.h: 1.26-1.27
sys/netinet/in_gif.c: 1.88
sys/netinet6/in6_gif.c: 1.86
add lock for percpu route like l2tp(4).
--
add lock for sclist to exclude ifconfig gifX add/delete and ifconfig gifX tunnel
--
update locking notes.
 1.90.2.2 02-May-2018  pgoyette Synch with HEAD
 1.90.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.93.6.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.93.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18 30-Oct-2019  knakahara Add sysctl nodes to control fragmentation with IPv[46] over IPv6 gif(4).

New sysctl node "net.inet6.ip6.gifpmtu" means
- 0 (default)
Fragment by IPV6_MMTU. All packets reach the destination certainly,
however the long packet performance is poor.
This is same behavior as before.
- 1
Fragment by outer interface's MTU. The long packet performance would
be good, however the packets may be dropped in some network paths
whose path MTU less than the interface's MTU.
- others
undefined yet

New sysctl node "net.interfaces.gif*.pmtu" means
- -1 (default)
Use system default value (net.inet6.ip6.gifpmtu).
- 0
Fragment by IPV6_MMTU for this gif(4) tunnel.
- 1
Fragment by outer interface's MTU for this gif(4) tunnel.
- others
undefined yet

See RFC4459 for more information and other solutions.
 1.17 27-Nov-2017  knakahara branches: 1.17.4;
preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).

After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).

update locking notes later.
 1.16 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.15 04-Jul-2016  knakahara branches: 1.15.10;
fix: gif(4) receive side race

A panic cause in rn_match() called by encap[46]_lookup(). The reason is that
gif(4) does not suspend receive packet processing in spite of suspending
transmit packet processing while anyone is doing gif(4) ioctl.
 1.14 26-Feb-2016  knakahara To eliminate gif_softc_list linear search, add extra argument to encapsw.pr_ctlinput().
 1.13 24-Apr-2008  ad branches: 1.13.46; 1.13.66;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.12 17-Feb-2007  dyoung branches: 1.12.38; 1.12.40;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.11 10-Dec-2005  elad branches: 1.11.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.10 02-Jun-2005  tron branches: 1.10.2;
Change the first argument of the encapsulation check function from
"const struct mbuf *" to "struct mbuf *". Without this change the
actual implementation cannot even use m_copydata() on the mbuf chain
which is broken.
 1.9 11-Nov-2002  itojun branches: 1.9.6;
make USE_ENCAPCHECK (in netinet*/*gif.c) to global option, GIF_ENCAPCHECK.
#ifdef out unneeded code when possible.
From: Krister Walfridsson <cato@df.lth.se>
 1.8 21-Dec-2001  itojun use radix table for inbound tunnel lookup (would increase performance
for machines with a lot of tunnels).
update route cache for IPvX-over-IPv6 tunnel on path MTU discovery.
snyc with kame
 1.7 21-Dec-2001  itojun move in6_gif_hlim decl to in6_gif.c. sync with kame
 1.6 16-Aug-2001  itojun gif interface now uses generic software interrupt
(on archs that support it). also, make gif ALTQ-capable on outgoing.
sync with kame, comments from thorpej.
 1.5 29-Jul-2001  itojun sync gif interface code with latest kame.
IFF_RUNNING is clearified. attach/detach logic is more clearner.
the old code mistakenly set IFF_UP by itself, now the behavior is gone.
 1.4 19-Apr-2000  itojun branches: 1.4.6; 1.4.8;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.3 03-Jul-1999  thorpej branches: 1.3.2;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_gif.h was initially added on branch kame.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_gif.h was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.8.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.4.8.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.4.8.1 03-Aug-2001  lukem update to -current
 1.4.6.3 11-Dec-2002  thorpej Sync with HEAD.
 1.4.6.2 08-Jan-2002  nathanw Catch up to -current.
 1.4.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.9.6.2 11-Dec-2005  christos Sync with head.
 1.9.6.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.10.2.2 26-Feb-2007  yamt sync with head.
 1.10.2.1 21-Jun-2006  yamt sync with head.
 1.11.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.12.40.1 18-May-2008  yamt sync with head.
 1.12.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.66.2 09-Jul-2016  skrll Sync with HEAD
 1.13.66.1 19-Mar-2016  skrll Sync with HEAD
 1.13.46.1 03-Dec-2017  jdolecek update from HEAD
 1.15.10.2 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #462):
sys/net/if_gif.c: revision 1.133, 1.134, 1.137
sys/net/if_gif.h: revision 1.28-1.29
sys/netinet/in_gif.c: revision 1.90-1.91
sys/netinet/in_gif.h: revision 1.18
sys/netinet6/in6_gif.c: revision 1.88-1.89
sys/netinet6/in6_gif.h: revision 1.17
preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).
After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).
update locking notes later.
--
update gif(4) locking notes.
--
IFF_RUNNING checking in Rx and Tx processing is unnecessary now.
Because the configs of gif (members of gif_var) are protected by psref(9).
--
remove duplicated null ckeck
 1.15.10.1 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.17.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.122 11-Apr-2024  knakahara Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.

The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.

XXX pullup-9, 10
 1.121 22-Dec-2022  msaitoh Fix typo in comment (s/mut be/msut be/). No functional change.
 1.120 17-May-2021  yamaguchi branches: 1.120.12;
Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.119 12-Jun-2020  roy branches: 1.119.6; 1.119.8;
Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.118 20-Jan-2020  thorpej Remove FDDI support.
 1.117 18-Oct-2019  ozaki-r branches: 1.117.2;
in6: reset the temporary address timer on a change of the interval period
 1.116 16-Oct-2019  ozaki-r Reorganize in6_tmpaddrtimer stuffs

- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule
 1.115 01-May-2018  maxv branches: 1.115.2; 1.115.6;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.114 24-Jan-2018  ozaki-r branches: 1.114.2;
Fix constraint violation of pserialize in in6_ifattach

in6_ifattach_loopback can sleep so we cannot use pserialize for it. Fortunately
in6_ifattach is alwasy called with IFNET_LOCK so pserialize isn't needed there
actually.
 1.113 10-Nov-2017  ozaki-r Use psref instead of pserialize because that code is sleepable
 1.112 23-Feb-2017  ozaki-r branches: 1.112.6;
Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.111 16-Feb-2017  knakahara add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.110 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.109 04-Jan-2017  christos branches: 1.109.2;
- kill NULL argument from in6_update_ifa
- amend in6_update_ifa1 to return the ia, so that we can use it in pfil hooks
to avoid NULL pointer crash.
 1.108 19-Dec-2016  ozaki-r Get rid of extra nd6_purge from in6_ifdetach

There were two nd6_purge in in6_ifdetach for some reason, but at least now
We don't need extra nd6_purge. Remove it and instead add assertions that
check if surely purged.
 1.107 30-Nov-2016  ozaki-r Fix panic on destroying an interface with IPv6 addresses obtained with RA

nd6_purge depends on that IPv6 addresses are purged. If addresses remain,
pfxlist_onlink_check called from nd6_purge dereferences a dangling pointer
(ia->ia6_ndpr) that is freed before calling pfxlist_onlink_check. Fix it by
removing addresses before calling nd6_purge, which is the original behavior
that was changed by in6.c,v 1.203 and in6_ifattach.c,v 1.99.

Note that it seems the issue occurs because of a hack that forcibly destroys
prefix list entries of a given interface in nd6_purge. We should tackle the
hack in the future.

Fix PR kern/51467
 1.106 18-Oct-2016  ozaki-r Add missing pserialize_read_exit
 1.105 16-Aug-2016  roy Separate ioctl address prefix management from RA prefix management
as we have no API for controlling the latter.

This fixes a long standing problem where addresses added with non /128
prefixes and non infinte address lifetimes would register a prefix route
which would expire. Subsequent calls set new lifetimes for the same address
would not affect the prefix route management, so once expired, the
prefix route would be impossible to add back as the kernel would remove it.
 1.104 01-Aug-2016  ozaki-r Fix kernel builds (gcc 4.8)
 1.103 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.102 20-Jul-2016  ozaki-r Apply pserialize to some iterations of IP address lists
 1.101 07-Jul-2016  ozaki-r branches: 1.101.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.100 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.99 04-Jul-2016  ozaki-r Remove redundant codes purging IPv6 addresses

Proposed on tech-net and tech-kern.
 1.98 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.97 27-Apr-2016  ozaki-r Get rid of unused argument from get_rand_ifid
 1.96 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.95 23-Feb-2015  martin Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.94 14-Nov-2014  maxv branches: 1.94.2;
Do not uselessly include <sys/malloc.h>.
 1.93 09-Sep-2014  rmind Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.92 05-Sep-2014  matt Don't use C++ keyword as variable.
Use different prefix for nd6_prefixctl members than for nd6_prefix members.
 1.91 05-Jun-2014  roy branches: 1.91.2;
Add IPV6CTL_AUTO_LINKLOCAL and ND6_IFF_AUTO_LINKLOCAL toggles which
control the automatic creation of IPv6 link-local addresses when an
interface is brought up.

Taken from FreeBSD.
 1.90 17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.89 25-Oct-2013  martin branches: 1.89.2;
Mark a variable as used only in diagnostic kernels
 1.88 18-Oct-2013  mrg convert a DIAGNOSTIC / panic into a KASSERTMSG().
 1.87 31-Dec-2011  christos branches: 1.87.6; 1.87.10;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.86 19-Nov-2011  tls branches: 1.86.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.85 19-Sep-2009  christos branches: 1.85.12;
backout the changes that establish a workqueue to synchronize the addresses
for arg and gre because they cause a race condition by calling ioctl() during
interface initialization. To make this work correctly we would need to
synchronize all interface init routines.
 1.84 13-Aug-2009  cegger buildfix: if_indexlim is of type size_t
 1.83 13-Aug-2009  dyoung Postpone to a workqueue adding link-local and loopback IPv6 addresses
to an interface. This keeps the kernel from entering ifp->if_ioctl
recursively, which can deadlock if if_ioctl takes locks. This will
fix deadlocks & LOCKDEBUG errors in agr(4) (kern/39940) and in
gre(4).
 1.82 30-Jul-2009  dyoung Fix typo in comment, s/SIOCSIFADDR/SIOCINITIFADDR/.
 1.81 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.80 24-Apr-2008  ad branches: 1.80.2; 1.80.8; 1.80.10;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.79 06-Dec-2007  dyoung branches: 1.79.12; 1.79.14;
Use ifa_insert(), ifa_remove().
 1.78 05-Dec-2007  dyoung Extract common code, creating a subroutine if_purgeaddrs(ifp,
family, purgeaddr) which applies function `purgeaddr' to each
address on `ifp' belonging to `family'.
 1.77 05-Dec-2007  dyoung Use IFADDR_FIRST(), IFADDR_NEXT().
 1.76 04-Dec-2007  dyoung Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.75 10-Nov-2007  dyoung branches: 1.75.2;
Use sockaddr_in6_init().
 1.74 01-Nov-2007  dyoung branches: 1.74.2;
De-__P().
 1.73 10-Aug-2007  dyoung branches: 1.73.2; 1.73.6;
Constify.
 1.72 09-Jul-2007  ad branches: 1.72.2; 1.72.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.71 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.70 15-Mar-2007  dyoung Bark if we cannot assign a link-local address. While I am here,
fix the grammar in a comment.
 1.69 22-Feb-2007  dyoung branches: 1.69.4; 1.69.6; 1.69.8;
Cosmetic: use TAILQ_FOREACH(). Remove extraneous () from return
statements.
 1.68 20-Nov-2006  dyoung branches: 1.68.4;
Use the TAILQ_/LIST_ macros instead of open-coding them.
 1.67 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.66 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.65 18-May-2006  liamjfoy branches: 1.65.8; 1.65.10;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.64 05-Mar-2006  rpaulo branches: 1.64.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.63 21-Jan-2006  rpaulo branches: 1.63.2; 1.63.4; 1.63.6;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.62 11-Dec-2005  christos branches: 1.62.2;
merge ktrace-lwp.
 1.61 20-Apr-2004  itojun branches: 1.61.12;
remove duplicated #include. PR 25234
 1.60 11-Feb-2004  itojun we have IFT_BRIDGE already, no need for #ifdef
 1.59 11-Feb-2004  christos We don't have IFT_{PFLOG,PFSYNC} (yet).
 1.58 11-Feb-2004  itojun missing bzero
 1.57 06-Sep-2003  itojun committed by mistake, sorry
 1.56 06-Sep-2003  itojun correct comment
 1.55 08-Jul-2003  itojun on interface detach, clear multicast forwarding table. from kame
 1.54 02-Nov-2002  perry branches: 1.54.6;
/*CONTCOND*/ while (0)'ed macros
 1.53 15-Sep-2002  itojun remove extra blank line
 1.52 11-Sep-2002  itojun reduce diff w/kame
 1.51 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.50 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.49 11-Jun-2002  itojun silence some of log(), as the codepath will be visited for IPv6-non-capable
interfaces too and can be annoying. net.inet6.icmp6.nd6_debug will
re-enable them.
 1.48 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.47 07-Jun-2002  itojun minor KNF to sync w/kame
 1.46 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.45 29-May-2002  itojun move per-interface ip6/icmp6 stat to ifnet->if_afdata. sync w/kame
 1.44 23-May-2002  itojun no longer need IFT_PROPVIRTUAL "bridge[0-9]+" check.
 1.43 23-May-2002  itojun simplify conditions to do DAD. sync w/kame
 1.42 23-May-2002  itojun do not have link-local address for IFT_BRIDGE
 1.41 21-Dec-2001  itojun branches: 1.41.8; 1.41.10;
whitespace/costmetic sync w/kame
 1.40 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.39 13-Nov-2001  lukem add RCSIDs
 1.38 23-Aug-2001  itojun do not try to bring IPv6 up on bridge*.
 1.37 18-Jul-2001  itojun do not malloc() during interrupt context for IPv6 multicast kludge table.
malloc() during interface initialization. sync with kame
 1.36 24-May-2001  itojun branches: 1.36.2;
print more diag message on in6_addmulti() failures.
 1.35 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.34 07-Feb-2001  itojun branches: 1.34.2;
during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.33 18-Jan-2001  itojun on interface removal (ifconfig destroy) do not remove default route by mistake
 1.32 05-Nov-2000  onoe First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.31 01-Oct-2000  itojun add missing \n. sync with kame.
 1.30 05-May-2000  itojun branches: 1.30.4;
correct in6_ifdetach() (previous code touched dangling pointers).
actually the corrected portion was never visited.
 1.29 27-Apr-2000  itojun correct in6_ifdetach(). free oia, not ia.
From: Lennart Augustsson <augustss@augustsson.net>
 1.28 16-Apr-2000  itojun perform neighbor unreachability detection on p2p links (spec requires
it for bidir p2p links).
improve -i in ndp(8) to allow tweaking per-interface ND flag on.
fix ndp(8) infinite loop on certain routing table setup.
 1.27 16-Apr-2000  itojun better sync with latest kame (cosmetic only).
 1.26 13-Apr-2000  itojun fix fatal bug in EUI64 generation (0xff -> 0xfe typo)
 1.25 12-Apr-2000  itojun revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.24 10-Apr-2000  itojun cosmetic (space before EOL), to ease diff against kame
 1.23 24-Mar-2000  itojun move ia6->ia6_dad_ch to dp->dad_timer_ch, to ease KAME code sharing.
now in6_var.h does not need to pull sys/callout.h in.
 1.22 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.21 02-Mar-2000  itojun configure ::1 to loopback dstaddr.
honor ifa_refcnt when configuring loopback.
 1.20 02-Mar-2000  itojun don't configure ifa_dstaddr for non-pointopoint interface,
so that we won't be returning them from routing socket manipulation.
 1.19 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.18 04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.17 02-Feb-2000  itojun implement in6_purgemkludge(). in6_ifdetach() calls it to avoid dangling
kludge entries. the situation would occur if you take the following steps:
- join multicast groups (default ones like linklocal all-node is fine)
- remove all IPv6 addresses manually
- remove pcmcia card

to thorpej: pls call in6_ifdetach() when PRU_PURGEIF is raised (just before
removing ifnet). it should do the right thing (unable to perform real test
though)
 1.16 02-Feb-2000  itojun remove route to link-local allnodes multicast address (ff02:x::/32),
when the last IPv6 address on an interface is get removed.
in6_ifattach() configures it and in6_ifdetach() removes it.

XXX last part of in6_purgeaddr looks very ugly, but there's no event for
"interface detach" (events are for "address detach").
 1.15 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.14 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.13 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.12 26-Sep-1999  is branches: 1.12.2; 1.12.8;
Don't even pretend we can create a nonglobal EUI64 out of an ARCNET link
level address. Instead, create the link-local address directly.
 1.11 25-Sep-1999  is invert u bit to convert EUI64 to RFC2373 interface ID for ARCnet
 1.10 20-Sep-1999  itojun tiny fix to ARCnet IPv6 support.
- in in6_ifattach_getifid(), we can grab interface id source iff the source
is universally (worldwide) unique. ARCnet hardware address is of 8bit and
does not satisfy the condition.
(in6_ifattach_getifid() is for getting interface id usable for pseudo
interfaces like gif*)
- xx_to_eui64() should return EUI64 format, not IPv6 interface id format.
this may seem awkward so I wish to clean these things up.
- in nd6.c, change if clause into case clause to allow future addition
of IFT_xxx easier.
 1.9 19-Sep-1999  is fix mergo
 1.8 19-Sep-1999  is Zeroth version of IPv6 support for ARCnet. Correct MTU handling still needs
to be done.
 1.7 13-Sep-1999  itojun - Call in{,6}_pcbdetach if ipsec initialization is failed during PRU_ATTACH.
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.

(sync with recent KAME)
 1.6 08-Sep-1999  itojun fix u bit in interface identifier for ether and p2p-802 interfacde.
 1.5 05-Sep-1999  itojun - invert u bit on interface id for pseudo interfaces, as suggested in RFC2373.
- do not perform IPv6 initialization for faith* interface, as they become
mistakingly IFF_UP. we are wondering if we should nuke in6_ifattach_p2p().
(sync with recent kame)
 1.4 10-Jul-1999  thorpej Clean up some printfs(), and mark a few for possible later nuking,
since they appear to be for debugging purposes only.
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_ifattach.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_ifattach.c was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.12.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.12.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.12.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.12.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.4.2 05-Aug-2003  msaitoh Pull up revision 1.55 via patch (requested by itojun in ticket #57):
on interface detach, clear multicast forwarding table.
 1.30.4.1 26-Feb-2001  he Pull up revision 1.33 (requested by itojun):
Do not remove default route by mistake on interface removal.
 1.34.2.8 11-Nov-2002  nathanw Catch up to -current
 1.34.2.7 17-Sep-2002  nathanw Catch up to -current.
 1.34.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.34.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.34.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.34.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.34.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.34.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.36.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.36.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.36.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.36.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.36.2.1 03-Aug-2001  lukem update to -current
 1.41.10.1 01-Sep-2003  tron Pull up revision 1.55 via patch (requested by itojun in ticket #1375):
on interface detach, clear multicast forwarding table. from kame
 1.41.8.2 20-Jun-2002  gehenna catch up with -current.
 1.41.8.1 30-May-2002  gehenna Catch up with -current.
 1.54.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.54.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.54.6.1 03-Aug-2004  skrll Sync with HEAD
 1.61.12.6 07-Dec-2007  yamt sync with head
 1.61.12.5 15-Nov-2007  yamt sync with head.
 1.61.12.4 03-Sep-2007  yamt sync with head.
 1.61.12.3 26-Feb-2007  yamt sync with head.
 1.61.12.2 30-Dec-2006  yamt sync with head.
 1.61.12.1 21-Jun-2006  yamt sync with head.
 1.62.2.1 01-Feb-2006  yamt sync with head.
 1.63.6.2 24-May-2006  yamt sync with head.
 1.63.6.1 13-Mar-2006  yamt sync with head.
 1.63.4.2 01-Jun-2006  kardel Sync with head.
 1.63.4.1 22-Apr-2006  simonb Sync with head.
 1.63.2.1 09-Sep-2006  rpaulo sync with head
 1.64.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.65.10.2 10-Dec-2006  yamt sync with head.
 1.65.10.1 22-Oct-2006  yamt sync with head
 1.65.8.2 12-Jan-2007  ad Sync with head.
 1.65.8.1 18-Nov-2006  ad Sync with head.
 1.68.4.2 24-Mar-2007  yamt sync with head.
 1.68.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.69.8.1 18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.69.6.1 11-Jul-2007  mjf Sync with head.
 1.69.4.4 20-Aug-2007  ad Sync with HEAD.
 1.69.4.3 01-Jul-2007  ad Adapt to callout API change.
 1.69.4.2 08-Jun-2007  ad Sync with head.
 1.69.4.1 10-Apr-2007  ad Sync with head.
 1.72.6.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.72.6.3 11-Nov-2007  joerg Sync with HEAD.
 1.72.6.2 04-Nov-2007  jmcneill Sync with HEAD.
 1.72.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.72.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.73.6.1 13-Nov-2007  bouyer Sync with HEAD
 1.73.2.2 09-Jan-2008  matt sync with HEAD
 1.73.2.1 06-Nov-2007  matt sync with HEAD
 1.74.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.74.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.75.2.1 08-Dec-2007  ad Sync with head.
 1.79.14.1 18-May-2008  yamt sync with head.
 1.79.12.2 17-Jan-2009  mjf Sync with HEAD.
 1.79.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.80.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.80.8.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.80.2.3 11-Mar-2010  yamt sync with head
 1.80.2.2 19-Aug-2009  yamt sync with head.
 1.80.2.1 04-May-2009  yamt sync with head.
 1.85.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.85.12.1 17-Apr-2012  yamt sync with head
 1.86.2.1 18-Feb-2012  mrg merge to -current.
 1.87.10.2 18-May-2014  rmind sync with head
 1.87.10.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.87.6.2 03-Dec-2017  jdolecek update from HEAD
 1.87.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.89.2.1 10-Aug-2014  tls Rebase.
 1.91.2.2 06-Apr-2015  snj Pull up following revision(s) (requested by martin in ticket #655):
sys/netinet6/in6.c: revision 1.182 via patch
sys/netinet6/in6_ifattach.c: revision 1.95 via patch
sys/netinet6/nd6.c: revision 1.158 via patch
sys/netinet6/nd6.h: revision 1.62 via patch
sys/netinet6/nd6_nbr.c: revision 1.104 via patch
sys/netinet6/nd6_rtr.c: revision 1.96 via patch
Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.91.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.94.2.8 28-Aug-2017  skrll Sync with HEAD
 1.94.2.7 05-Feb-2017  skrll Sync with HEAD
 1.94.2.6 05-Dec-2016  skrll Sync with HEAD
 1.94.2.5 05-Oct-2016  skrll Sync with HEAD
 1.94.2.4 09-Jul-2016  skrll Sync with HEAD
 1.94.2.3 29-May-2016  skrll Sync with HEAD
 1.94.2.2 22-Apr-2016  skrll Sync with HEAD
 1.94.2.1 06-Apr-2015  skrll Sync with HEAD
 1.101.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.101.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.101.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.101.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.101.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.109.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.112.6.1 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #354):
sys/netinet6/in6_ifattach.c: revision 1.113
sys/netinet6/nd6.c: revision 1.238
Use psref instead of pserialize because that code is sleepable
--
Use psref instead of pserialize because that code is sleepable
 1.114.2.1 02-May-2018  pgoyette Synch with HEAD
 1.115.6.1 23-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #368):

sys/netinet6/in6_ifattach.h: revision 1.14
sys/netinet6/ip6_input.c: revision 1.212
sys/netinet6/ip6_input.c: revision 1.213
sys/netinet6/ip6_input.c: revision 1.214
sys/netinet6/in6_var.h: revision 1.101
sys/netinet6/in6_var.h: revision 1.102
sys/netinet6/in6_ifattach.c: revision 1.116
sys/netinet6/in6_ifattach.c: revision 1.117
tests/net/ndp/t_ra.sh: revision 1.33

Reorganize in6_tmpaddrtimer stuffs
- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule

Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change
ip6_temp_preferred_lifetime is used to calculate an interval period to
regenerate temporary addresse by
TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR
as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE +
DESYNC_FACTOR), otherwise it will be negative and go wrong, for example
KASSERT(to_ticks >= 0) in callout_schedule_locked fails.

tests: add tests for the validateion of net.inet6.ip6.temppltime

in6: reset the temporary address timer on a change of the interval period
 1.115.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.115.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.117.2.1 25-Jan-2020  ad Sync with head.
 1.119.8.1 31-May-2021  cjep sync with head
 1.119.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.120.12.1 18-Apr-2024  martin Pull up following revision(s) (requested by knakahara in ticket #659):

sys/netinet6/in6_ifattach.c: revision 1.122
sys/netinet/sctp_asconf.c: revision 1.14
sys/netinet6/nd6.c: revision 1.282

Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.
The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.
 1.14 16-Oct-2019  ozaki-r Reorganize in6_tmpaddrtimer stuffs

- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule
 1.13 19-Sep-2009  christos branches: 1.13.64; 1.13.68;
backout the changes that establish a workqueue to synchronize the addresses
for arg and gre because they cause a race condition by calling ioctl() during
interface initialization. To make this work correctly we would need to
synchronize all interface init routines.
 1.12 13-Aug-2009  dyoung Postpone to a workqueue adding link-local and loopback IPv6 addresses
to an interface. This keeps the kernel from entering ifp->if_ioctl
recursively, which can deadlock if if_ioctl takes locks. This will
fix deadlocks & LOCKDEBUG errors in agr(4) (kern/39940) and in
gre(4).
 1.11 01-Nov-2007  dyoung branches: 1.11.20;
De-__P().
 1.10 05-Mar-2006  rpaulo branches: 1.10.36; 1.10.38; 1.10.42;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.9 10-Dec-2005  elad branches: 1.9.4; 1.9.6; 1.9.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.8 08-Jun-2002  itojun branches: 1.8.6; 1.8.22;
sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.7 16-Oct-2001  itojun branches: 1.7.10;
remove unused #define. sync whitespace/comment with kame.
 1.6 12-Apr-2000  itojun branches: 1.6.6; 1.6.8;
revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.5 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.4 19-Sep-1999  is branches: 1.4.2; 1.4.8;
Zeroth version of IPv6 support for ARCnet. Correct MTU handling still needs
to be done.
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_ifattach.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_ifattach.h was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.6.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.6.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.6.6.1 22-Oct-2001  nathanw Catch up to -current.
 1.7.10.1 20-Jun-2002  gehenna catch up with -current.
 1.8.22.2 15-Nov-2007  yamt sync with head.
 1.8.22.1 21-Jun-2006  yamt sync with head.
 1.8.6.1 11-Dec-2005  christos Sync with head.
 1.9.8.1 13-Mar-2006  yamt sync with head.
 1.9.6.1 22-Apr-2006  simonb Sync with head.
 1.9.4.1 09-Sep-2006  rpaulo sync with head
 1.10.42.1 13-Nov-2007  bouyer Sync with HEAD
 1.10.38.1 06-Nov-2007  matt sync with HEAD
 1.10.36.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.11.20.2 11-Mar-2010  yamt sync with head
 1.11.20.1 19-Aug-2009  yamt sync with head.
 1.13.68.1 23-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #368):

sys/netinet6/in6_ifattach.h: revision 1.14
sys/netinet6/ip6_input.c: revision 1.212
sys/netinet6/ip6_input.c: revision 1.213
sys/netinet6/ip6_input.c: revision 1.214
sys/netinet6/in6_var.h: revision 1.101
sys/netinet6/in6_var.h: revision 1.102
sys/netinet6/in6_ifattach.c: revision 1.116
sys/netinet6/in6_ifattach.c: revision 1.117
tests/net/ndp/t_ra.sh: revision 1.33

Reorganize in6_tmpaddrtimer stuffs
- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule

Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change
ip6_temp_preferred_lifetime is used to calculate an interval period to
regenerate temporary addresse by
TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR
as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE +
DESYNC_FACTOR), otherwise it will be negative and go wrong, for example
KASSERT(to_ticks >= 0) in callout_schedule_locked fails.

tests: add tests for the validateion of net.inet6.ip6.temppltime

in6: reset the temporary address timer on a change of the interval period
 1.13.64.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.23 01-Sep-2023  andvar fix typos in comments, mainly s/innner/inner/.
 1.22 07-Dec-2022  knakahara gif(4), ipsec(4) and l2tp(4) use encap_attach_addr().
 1.21 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.20 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.19 29-Jan-2020  thorpej branches: 1.19.6;
Adopt <net/if_stats.h>.
 1.18 19-Sep-2019  knakahara branches: 1.18.2;
Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.17 03-Sep-2018  knakahara branches: 1.17.4;
fix: l2tp(4) cannot receive packets after reset session without reset tunnel. Pointed out by k-goda@IIJ

When the following operations are done after established session, the l2tp0
cannot receive packets until done deletetunnel && tunnel "src" "dst".
 1.16 21-Jun-2018  knakahara branches: 1.16.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.15 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.14 26-Jan-2018  maxv branches: 1.14.2;
Several fixes in L2TP:

* l2tp_input(): use m_copydata, and ensure there is enough space in the
chain. Otherwise overflow.

* l2tp_tcpmss_clamp(): ensure there is enough space in the chain.

* in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL.

* in_l2tp_input(): no need to call m_pullup since we use m_copydata.
Just check the space in the chain.

* in_l2tp_input(): if there is a cookie, make sure the chain has enough
space.

* in6_l2tp_input(): same changes as in_l2tp_input().

Ok knakahara@
 1.13 25-Jan-2018  maxv Style, reduce the indentation level when possible, and add a missing NULL
check after M_PREPEND.
 1.12 18-Dec-2017  knakahara fix mbuf leaks. pointed out and suggested by kre@n.o, thanks.
 1.11 18-Dec-2017  knakahara backout wrong fix again, sorry.
 1.10 15-Dec-2017  knakahara Fix pullup'ed mbuf leaks. The match function just requires enough mbuf length.

XXX need pullup-8
 1.9 15-Dec-2017  knakahara backout wrong fix as it causes atf net/ipsec/t_ipsec_l2tp failures.
 1.8 11-Dec-2017  knakahara fix pullup'ed mbuf leaks. pointed out by maxv@n.o, thanks.

XXX need pullup-8
 1.7 15-Nov-2017  knakahara branches: 1.7.2;
Add argument to encapsw->pr_input() instead of m_tag.
 1.6 11-Jul-2017  knakahara branches: 1.6.4;
l2tp(4): fix mbuf leak when tunnel nested over the limit

XXX need pullup -8 branch
 1.5 04-Apr-2017  knakahara branches: 1.5.4; 1.5.8;
fix module build
 1.4 04-Apr-2017  sevan Revert change to allow builds to continue until the missing vlan.h file is committed.
https://mail-index.netbsd.org/source-changes/2017/04/04/msg083283.html
 1.3 04-Apr-2017  knakahara remove unnecessary if_vlanvar.h. add missing include "vlan.h".

pointed out by s-yamaguchi@IIJ, thanks.
 1.2 30-Mar-2017  knakahara remove duplicated validation. That is already done in l2tp_lookup_session_ref().

pointed out by s-yamaguchi@IIJ, thanks.
 1.1 16-Feb-2017  knakahara branches: 1.1.2;
add missing files.
 1.1.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 16-Feb-2017  pgoyette file in6_l2tp.c was added on branch pgoyette-localcount on 2017-03-20 06:57:51 +0000
 1.5.8.8 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.5.8.7 10-Sep-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1018):

sys/netinet6/in6_l2tp.c: revision 1.17
sys/netinet/in_l2tp.c: revision 1.16

fix: l2tp(4) cannot receive packets after reset session without reset tunnel. Pointed out by k-goda@IIJ

When the following operations are done after established session, the l2tp0
cannot receive packets until done deletetunnel && tunnel "src" "dst".
 1.5.8.6 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.5.8.5 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.5.8.4 08-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #614):
sys/net/if_l2tp.c: revision 1.20
sys/netinet6/in6_l2tp.c: revision 1.13
sys/netinet6/in6_l2tp.c: revision 1.14
sys/net/if_l2tp.h: revision 1.3
sys/net/if_l2tp.c: revision 1.13
sys/netinet/in_l2tp.c: revision 1.10
sys/net/if_l2tp.c: revision 1.18
sys/netinet/in_l2tp.c: revision 1.11
sys/net/if_l2tp.c: revision 1.19
sys/netinet/in_l2tp.c: revision 1.12

If if_attach() failed in the attach function, return. Add comments about if_initialize().
suggested by ozaki-r@n.o.

Fix null deref, m could be NULL if M_PREPEND fails.

style

Style, reduce the indentation level when possible, and add a missing NULL
check after M_PREPEND.

Several fixes in L2TP:
* l2tp_input(): use m_copydata, and ensure there is enough space in the
chain. Otherwise overflow.
* l2tp_tcpmss_clamp(): ensure there is enough space in the chain.
* in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL.
* in_l2tp_input(): no need to call m_pullup since we use m_copydata.
Just check the space in the chain.
* in_l2tp_input(): if there is a cookie, make sure the chain has enough
space.
* in6_l2tp_input(): same changes as in_l2tp_input().
Ok knakahara@

Use MH_ALIGN instead, ok knakahara@.
 1.5.8.3 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #461):
sys/netinet/in_l2tp.c: revision 1.9
sys/netinet6/in6_l2tp.c: revision 1.12
fix mbuf leaks. pointed out and suggested by kre@n.o, thanks.
 1.5.8.2 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.5.8.1 12-Jul-2017  martin Pull up following revision(s) (requested by knakahara in ticket #121):
sys/netinet6/in6_l2tp.c: revision 1.6
sys/netinet/in_l2tp.c: revision 1.3
l2tp(4): fix mbuf leak when tunnel nested over the limit
XXX need pullup -8 branch
 1.5.4.2 21-Apr-2017  bouyer Sync with HEAD
 1.5.4.1 04-Apr-2017  bouyer file in6_l2tp.c was added on branch bouyer-socketcan on 2017-04-21 16:54:06 +0000
 1.6.4.2 28-Aug-2017  skrll Sync with HEAD
 1.6.4.1 11-Jul-2017  skrll file in6_l2tp.c was added on branch nick-nhusb on 2017-08-28 17:53:12 +0000
 1.7.2.2 03-Dec-2017  jdolecek update from HEAD
 1.7.2.1 15-Nov-2017  jdolecek file in6_l2tp.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.14.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.14.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.14.2.1 02-May-2018  pgoyette Synch with HEAD
 1.16.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.16.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.16.2.1 10-Jun-2019  christos Sync with HEAD
 1.17.4.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.18.2.1 29-Feb-2020  ad Sync with head.
 1.19.6.1 03-Apr-2021  thorpej Sync with HEAD.
 1.1 16-Feb-2017  knakahara branches: 1.1.2; 1.1.6; 1.1.14; 1.1.18;
add missing files.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 16-Feb-2017  jdolecek file in6_l2tp.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.14.2 28-Aug-2017  skrll Sync with HEAD
 1.1.14.1 16-Feb-2017  skrll file in6_l2tp.h was added on branch nick-nhusb on 2017-08-28 17:53:12 +0000
 1.1.6.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.6.1 16-Feb-2017  bouyer file in6_l2tp.h was added on branch bouyer-socketcan on 2017-04-21 16:54:06 +0000
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 16-Feb-2017  pgoyette file in6_l2tp.h was added on branch pgoyette-localcount on 2017-03-20 06:57:51 +0000
 1.13 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.12 12-Dec-2018  rin branches: 1.12.36;
PR kern/53562

Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.

OK msaitoh thorpej
 1.11 19-Sep-2018  rin Fix in_undefer_cksum() and in6_undefer_cksum().

The 4th argument for in[46]_cksum() should be length of L4 header +
L4 payload. The previous revisions are wrong

- for IPv4 when hdrlen != 0
- for IPv6 always

These functions are used only in net/if_loop.c and
arch/powerpc/booke/dev/pq3etsec.c under some special circumferences.
This should be why the bugs have not been found until today.

OK maxv
 1.10 10-Aug-2018  maxv Remove the callback and localify. Same as IPv4.
 1.9 10-Aug-2018  maxv Rename

ip6_undefer_csum -> in6_undefer_cksum
in6_delayed_cksum -> in6_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in6_offload.c. Add comments to explain what
we're doing.

Same as IPv4.
 1.8 01-Jun-2018  maxv branches: 1.8.2;
Rename

M_CSUM_DATA_IPv6_HL -> M_CSUM_DATA_IPv6_IPHL
M_CSUM_DATA_IPv6_HL_SET -> M_CSUM_DATA_IPv6_SET

Reduces the diff against IPv4. Also, clarify the definitions.
 1.7 14-Feb-2017  ozaki-r branches: 1.7.12;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.6 25-Apr-2011  yamt branches: 1.6.14; 1.6.32; 1.6.36; 1.6.40;
ip6_undefer_csum:
- don't forget ntohs
- KNF
 1.5 11-Dec-2010  matt branches: 1.5.2;
Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
 1.4 02-May-2007  dyoung branches: 1.4.56; 1.4.60;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.3 25-Apr-2007  dyoung Back out last. To compile, it depends on changes that I am not
ready to commit, yet.
 1.2 25-Apr-2007  dyoung Constify.
 1.1 25-Nov-2006  yamt branches: 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.14; 1.1.16;
move tso-by-software code to their own files. no functional changes.
 1.1.16.1 11-Jul-2007  mjf Sync with head.
 1.1.14.1 08-Jun-2007  ad Sync with head.
 1.1.10.1 07-May-2007  yamt sync with head.
 1.1.8.2 12-Jan-2007  ad Sync with head.
 1.1.8.1 25-Nov-2006  ad file in6_offload.c was added on branch newlock2 on 2007-01-12 01:04:15 +0000
 1.1.6.3 03-Sep-2007  yamt sync with head.
 1.1.6.2 30-Dec-2006  yamt sync with head.
 1.1.6.1 25-Nov-2006  yamt file in6_offload.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:38 +0000
 1.1.4.2 10-Dec-2006  yamt sync with head.
 1.1.4.1 25-Nov-2006  yamt file in6_offload.c was added on branch yamt-splraiseipl on 2006-12-10 07:19:15 +0000
 1.4.60.1 07-Jan-2011  matt If using hardware checksum offload and the packet can't be h/w checksumed
(for whatever reason, some hardware is stupid) allow the driver to calculate
the checksum instead.
 1.4.56.2 31-May-2011  rmind sync with head
 1.4.56.1 05-Mar-2011  rmind sync with head
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.40.1 21-Apr-2017  bouyer Sync with HEAD
 1.6.36.1 20-Mar-2017  pgoyette Sync with HEAD
 1.6.32.1 28-Aug-2017  skrll Sync with HEAD
 1.6.14.1 03-Dec-2017  jdolecek update from HEAD
 1.7.12.4 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.7.12.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.7.12.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.7.12.1 25-Jun-2018  pgoyette Sync with HEAD
 1.8.2.1 10-Jun-2019  christos Sync with HEAD
 1.12.36.1 02-Aug-2025  perseant Sync with HEAD
 1.11 05-Aug-2019  christos add forward decl
 1.10 12-Dec-2018  rin PR kern/53562

Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.

OK msaitoh thorpej
 1.9 10-Aug-2018  maxv Remove the callback and localify. Same as IPv4.
 1.8 10-Aug-2018  maxv Rename

ip6_undefer_csum -> in6_undefer_cksum
in6_delayed_cksum -> in6_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in6_offload.c. Add comments to explain what
we're doing.

Same as IPv4.
 1.7 25-Apr-2011  yamt branches: 1.7.54; 1.7.56;
undefer csum in looutput.
looutput is used by various code (ether_output, mcast) to loopback packets.
 1.6 11-Dec-2010  matt branches: 1.6.2;
Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
 1.5 02-May-2007  dyoung branches: 1.5.56; 1.5.60;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.4 25-Apr-2007  dyoung Back out last. To compile, it depends on changes that I am not
ready to commit, yet.
 1.3 25-Apr-2007  dyoung Constify.
 1.2 25-Nov-2006  yamt branches: 1.2.4; 1.2.6; 1.2.8; 1.2.10; 1.2.14; 1.2.16;
move tso-by-software code to their own files. no functional changes.
 1.1 23-Nov-2006  yamt implement ipv6 TSO.
partly from Matthias Scheler. tested by him.
 1.2.16.1 11-Jul-2007  mjf Sync with head.
 1.2.14.1 08-Jun-2007  ad Sync with head.
 1.2.10.1 07-May-2007  yamt sync with head.
 1.2.8.2 12-Jan-2007  ad Sync with head.
 1.2.8.1 25-Nov-2006  ad file in6_offload.h was added on branch newlock2 on 2007-01-12 01:04:15 +0000
 1.2.6.3 03-Sep-2007  yamt sync with head.
 1.2.6.2 30-Dec-2006  yamt sync with head.
 1.2.6.1 25-Nov-2006  yamt file in6_offload.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:38 +0000
 1.2.4.2 10-Dec-2006  yamt sync with head.
 1.2.4.1 25-Nov-2006  yamt file in6_offload.h was added on branch yamt-splraiseipl on 2006-12-10 07:19:15 +0000
 1.5.60.1 07-Jan-2011  matt If using hardware checksum offload and the packet can't be h/w checksumed
(for whatever reason, some hardware is stupid) allow the driver to calculate
the checksum instead.
 1.5.56.2 31-May-2011  rmind sync with head
 1.5.56.1 05-Mar-2011  rmind sync with head
 1.6.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.7.56.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.7.56.1 10-Jun-2019  christos Sync with HEAD
 1.7.54.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.7.54.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.177 04-Nov-2022  ozaki-r inpcb: get rid of parentheses for return value
 1.176 04-Nov-2022  ozaki-r inpcb: use in_port_t for port numbers
 1.175 04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.174 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.173 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.172 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.171 14-Oct-2022  ryo Avoid error of "-Wreturn-local-addr", and simplify the logic.

However, -Wreturn-local-addr is still disabled by default by GCC_NO_RETURN_LOCAL_ADDR
in bsd.own.mk because it causes errors in other parts.
 1.170 29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.169 29-Jul-2022  knakahara Remove obsoleted comments.

These comments are added with IFNET_LOCK by in_pcb.c:r1.180 and
in6_pcb.c:r1.162. And then, IFNET_LOCK codes are removed in
in_pcb.c:r1.183 and in6_pcb.c:r1.166, however the comments have
remained.
 1.168 09-Jun-2022  knakahara refactor: use TAILQ_FOREACH instead of TAILQ_FOREACH_SAFE about inpt_queue.

They don't use "ninph" pointer and don't remove elements.
 1.167 08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.166 15-May-2019  ozaki-r Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
Unfortunatetly it caused a deadlock instead. A known scenario causing a
deadlock is to occur the following two operations concurrently: (a) a removal of
an IP adddres assigned to an interface and (b) a manipulation of multicast
groups to the interface. The resource dependency graph is like this:
softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

Thanks to the previous commit that avoids data races on if_flags for
IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
deadlock.

PR kern/54189
 1.165 27-Feb-2018  maxv branches: 1.165.4;
Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.164 08-Feb-2018  dholland Typos.
 1.163 22-Dec-2017  ozaki-r Add missing curlwp_bindx
 1.162 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.161 25-Apr-2017  ozaki-r branches: 1.161.4;
Check if solock of PCB is held when SP caches in the PCB are accessed

To this end, a back pointer from inpcbpolicy to inpcb_hdr is added.
 1.160 20-Apr-2017  ozaki-r Simplify logic of udp4_sendup and udp6_sendup

They are always passed a socket with the same protocol faimiliy
as its own: AF_INET for udp4_sendup and AF_INET6 for udp6_sendup.
 1.159 02-Mar-2017  ozaki-r Make sure im6o_memberships is protected by in6p's lock (solock)
 1.158 02-Mar-2017  ozaki-r Use LIST_* macros

No functional change.
 1.157 13-Feb-2017  ozaki-r Replace splnet with splsoftnet
 1.156 23-Jan-2017  ozaki-r Get rid of splnet for pool(9)

We don't need it anymore.
 1.155 13-Dec-2016  ozaki-r branches: 1.155.2;
Remove unnecessary inclusions of nd6.h
 1.154 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.153 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.152 31-Oct-2016  christos restore previous logic.
 1.151 31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.150 29-Sep-2016  roy Now that we disallow sending or receiving from invalid addresses,
allow binding to tentative addresses.
 1.149 26-Aug-2016  roy Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.148 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.147 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.146 15-Jul-2016  ozaki-r Use ifatoia6 macro

No functional change.
 1.145 21-Jun-2016  ozaki-r branches: 1.145.2;
Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.144 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.143 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.142 24-May-2015  rtr remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
 1.141 19-May-2015  ozaki-r Use NULL instead of 0 for pointers
 1.140 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.139 27-Apr-2015  ozaki-r Add missing error checks on rtcache_setdst

It can fail with ENOMEM.
 1.138 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.137 26-Apr-2015  rtr return EINVAL if sin{,6}_len != sizeof(sockaddr_in{,6}) respectively in
in{,6}_pcbconnect().

checking just m->m_len isn't enough because there are various places that
assume sa_len has been properly populated.
 1.136 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.135 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.134 25-Nov-2014  seanb branches: 1.134.2;
Really make SO_REUSEPORT and SO_REUSEADDR equivalent for multicast
sockets. From FreeBSD.
 1.133 25-Nov-2014  seanb Clean up any dangling ifp references in (struct in6pcb *)->in6p_v4moptions
(v4 multicast options off v4 mapped v6 socket) on interface destruction. The
code to clean this up in a true v4 socket was moved to its own function
which is now also called in the corresponding place for v6 sockets on
interface destruction.
 1.132 14-Nov-2014  maxv Do not uselessly include <sys/malloc.h>.
 1.131 11-Oct-2014  christos Succeed binding to multicast address for now: Open questions:
Open questions:

http://mail-index.netbsd.org/tech-net/2014/07/23/msg004714.html
 1.130 11-Oct-2014  christos Make IPV4 mapped addresses able to do IPV4 multicast. Fixes needed:

- allow binding to mapped v4 multicast addresses
- define v4moptions, allow setting it via ioctl, pass it to ip_output,
free it when killing the pcb.

Ideally we would allow the IPV6 multicast setsockopts work on mapped addresses
too, but this is a lot more work and linux does not do it either.
 1.129 07-Sep-2014  rmind in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.

This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.128 05-Aug-2014  rtr branches: 1.128.2;
revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.127 03-Aug-2014  rmind in6_pcbdetach: now that IGMP and multicast groups are MP-safe, we can move
the ip6_freemoptions() call outside the softnet_lock. Should fix PR/49065.
 1.126 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.125 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.124 23-Nov-2013  christos branches: 1.124.2;
convert from CIRCLEQ to TAILQ.
 1.123 05-Jun-2013  christos branches: 1.123.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.122 12-Apr-2013  christos PR/47738: connect(2) to 239.x.y.z should return error but does not.
 1.121 24-Aug-2012  dholland branches: 1.121.2;
Remove stray #undef, probably someone's debugging leftover from long ago.
 1.120 25-Jun-2012  christos rename rfc6056 -> portalgo, requested by yamt
 1.119 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.118 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.117 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.116 24-Sep-2011  christos branches: 1.116.2; 1.116.6;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.115 31-Aug-2011  plunky NULL does not need a cast
 1.114 04-May-2011  dyoung Invalidate the vestigital PCB at the top of in6_pcblookup_connect() to
fix the bug where incoming TCPv6 connections were reset.
 1.113 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.112 20-Aug-2010  joerg branches: 1.112.2;
Remove stray {
 1.111 20-Aug-2010  joerg Consider a mapped IPv4 address of 0.0.0.0 as unspecified. This allows
using mapped IPv4 address with connect without preceding bind.
 1.110 26-May-2009  pooka branches: 1.110.2; 1.110.4;
POOL_INIT -> pool_init
 1.109 12-May-2009  elad Implicit EPERM -> explicit EACCES.

Requested by ad@ and yamt@.
 1.108 02-May-2009  elad Replace wrong __UNCONST() use with a local variable.

Similar to issues pointed out by bouyer@ and forgotten by me when I did
the last commit.

Should fix issues reported on current-users@ in:

http://mail-index.netbsd.org/current-users/2009/05/02/msg009273.html
 1.107 30-Apr-2009  elad - Make in6_pcbbind_{addr,port}() static

- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()

- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
so that we have a more complete context

- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()

- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
not a copy of it, pointed out by bouyer@, thanks!

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
 1.106 22-Apr-2009  elad Only check if the port is used if it was specified.

Should fix problem reported in

http://mail-index.netbsd.org/current-users/2009/04/22/msg009130.html
 1.105 20-Apr-2009  elad Replace KAUTH_GENERIC_ISSUSER with a better alternative.
 1.104 20-Apr-2009  elad Extract in6_pcbbind()'s guts into two new routines: in6_pcbbind_addr() and
in6_pcbbind_port(), used for binding to an address and a port respectively.

While here, fix a possible "leak" of an in6pcb when binding to an address
succeeded but binding to an auto-assigned port failed.

Proposed and received no objections on tech-net@:

http://mail-index.netbsd.org/tech-net/2009/04/15/msg001223.html
 1.103 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.102 14-Apr-2009  elad Don't set sin->sin_port and sin6->sin6_port to 0 before calling
ifa_ifwithaddr(), as we no longer do a byte compare on the entire struct.

Reviewed by and okay from dyoung@.
 1.101 18-Mar-2009  cegger bcopy -> memcpy
 1.100 18-Mar-2009  cegger bzero -> memset
 1.99 20-Aug-2008  matt branches: 1.99.2; 1.99.8;
Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.
 1.98 04-Aug-2008  matt Free the socket only after disposing of the PCB.
 1.97 24-Apr-2008  ad branches: 1.97.2; 1.97.4; 1.97.8;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.96 20-Mar-2008  dyoung branches: 1.96.2;
Use ip6_clearpktopts() to destroy the IPv6 PCB's in6p_outputopts,
so that there's no chance of either leaking memory, or leaving
dangling pointers to a route cache.
 1.95 19-Mar-2008  dyoung No code ever sets struct ip6_pktopts member ip6po_m, so get rid of
it.
 1.94 14-Jan-2008  dyoung branches: 1.94.2; 1.94.6;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in6_losing().
 1.93 12-Jan-2008  dyoung Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().
 1.92 10-Jan-2008  dyoung Save some rtcache_getrt() calls.
 1.91 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.90 21-Nov-2007  drochner branches: 1.90.2; 1.90.6;
Fix in6_pcbrtentry() for the case of IPv6-mapped IPv4 addresses:
don't assume that the cached route is a sockaddr_in6, and do the
right comparisions so that no out-of-bounds memory is accessed.

btw, the use of "#ifdef INET" throughout the source doesn't look clean
to me: There are 2 cases -- whether AF_INET is usable by userland
programs, and whether IPv4 is supported as on-wire protocol.
 1.89 10-Nov-2007  dyoung Use sockaddr_in6_init().
 1.88 19-Jul-2007  dyoung branches: 1.88.4; 1.88.6; 1.88.10; 1.88.12; 1.88.14;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.87 23-May-2007  christos branches: 1.87.2;
Ansify + add a few comments, from Karl Sjödahl
 1.86 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.85 12-Mar-2007  ad branches: 1.85.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.84 04-Mar-2007  christos branches: 1.84.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.83 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.82 26-Jan-2007  dyoung branches: 1.82.2;
Change a couple of bzeros to memsets.
 1.81 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.80 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.79 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.78 08-Dec-2006  joerg Remove now superflous {.
 1.77 08-Dec-2006  joerg When a dynamic route is deleted in in_losing and in6_losing, rtrequest
is called, but the current reference via the PCB is not removed. This
is effectively a leaked reference. Call rtfree unconditional.
 1.76 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.75 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.74 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.73 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.72 23-Jul-2006  ad branches: 1.72.4; 1.72.6;
Use the LWP cached credentials where sane.
 1.71 14-May-2006  elad integrate kauth.
 1.70 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.69 21-Jan-2006  rpaulo branches: 1.69.2; 1.69.4; 1.69.6; 1.69.8; 1.69.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.68 15-Nov-2005  dsl branches: 1.68.2;
Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.67 29-May-2005  christos branches: 1.67.2; 1.67.8;
- avoid shadowed variables
- sprinkle const.
 1.66 04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.65 24-Jun-2004  drochner abstain from typecasting the LHS of an assignment;
gcc-3.4.x doesn't like it
 1.64 26-Apr-2004  jonathan Fix per-PCB IPsec policy cache for FAST_IPSEC:

The sys/netipsec policy-cache (added by Jason Thorpe as a rewrite of
the KAME per-PCB policy cache) assumes that policy-cacheable PCBs
always has a non-NULL inph_sp in the common PCB header. So we must
do all the per-PCB policy cache calls when either (KAME) IPSEC, or
FAST_IPSEC is defined. ``Make it so''.

We can now support non-IPsec'ed IPv6 traffic, when both
``options FAST_IPSEC'' and ``options INET6'' are configured.
 1.63 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.62 29-Mar-2004  atatat Make these compile without INET. tcp_input probably needs a lot more
work...
 1.61 13-Jan-2004  itojun branches: 1.61.2;
avoid deref-after-free.
http://sources.zabbadoz.net/freebsd/patchset/106-ipsec-pcb-discon.diff
 1.60 05-Nov-2003  itojun use hash table for in6_pcbbind(). similar to in_pcb 1.89 -> 1.90
 1.59 30-Sep-2003  christos Fix off-by-one in PRC_NCMDS check. From FreeBSD via OpenBSD
 1.58 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.57 06-Sep-2003  itojun clarify flowlabel handling
 1.56 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.55 13-Aug-2003  itojun in6_pcbrtentry() now returns IPv4 rtentry if in6pcb is connected to IPv4 mapped
address. PR kern/22431 from Andreas Gustafsson
 1.54 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.53 05-Nov-2002  perry branches: 1.53.6;
include opt_inet.h -- found by David Laight
 1.52 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.51 26-Aug-2002  itojun pass proc * to in6_pcbsetport. PR 18073
 1.50 20-Aug-2002  itojun sync up use_deprecated handling with latest kame.
- bind(deprecated) is allowed, trusting userland app is doing the right thing
- use_deprecated default to 1
 1.49 11-Jun-2002  itojun share policy-on-pcb for listening socket. sync w/kame
todo: share even more, avoid frequent updates of spidx
 1.48 28-May-2002  itojun correct in*_pcbrtentry. check cached value correctly.
 1.47 28-May-2002  itojun in in*_pcbrtentry(), check if route is still valid (RTF_UP),
and address family is still valid.
 1.46 21-Mar-2002  itojun branches: 1.46.4; 1.46.6;
protect in6pcb queue operation by splnet, as pcb queue will be touched
by in6_pcbpurgeif() under splnet.
 1.45 21-Dec-2001  itojun whitespace/costmetic sync w/kame
 1.44 13-Nov-2001  lukem add RCSIDs
 1.43 24-Oct-2001  itojun more whitespace sync with kame
 1.42 16-Oct-2001  itojun branches: 1.42.2;
remove unused #define. sync whitespace/comment with kame.
 1.41 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.40 06-Aug-2001  itojun cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.39 25-Jul-2001  itojun allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.38 02-Jul-2001  itojun branches: 1.38.2;
on interface removal, remove multicast groups joined from pcb, before
removing interface addresses. without the change, we may deref
NULL pointer in in_pcbpurgeif(). from jinmei@kame, sync with kame
 1.37 27-Jun-2001  itojun netbsd; on interface removal, force pcbs to leave from multicast groups
pointing toward the interface about to be removed. sync with kame
XXX still need more discussions on semantics. the behavior should be safer
 1.36 11-May-2001  itojun there's no need to #if NFAITH here. IN6P_FAITH can be set even on
NFAITH == 0 kernel, it is safer to always check the condition.
sync with kame.
 1.35 11-Feb-2001  itojun branches: 1.35.2;
pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.34 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.33 21-Dec-2000  itojun make sure we notify of routing changes, even if we have net route pointed
to by inpcb.
 1.32 19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.31 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.30 26-Aug-2000  itojun implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
 1.29 07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.28 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.27 02-Jul-2000  itojun repair kernel faithd(8) support. there were two mistakes:
(1) tcp6_input dropped packets for translation
(2) in6_pcblookup_connect was too strict
 1.26 08-Jun-2000  itojun branches: 1.26.2;
make sure not to overwrite sockaddr on PRU_SEND/PRU_CONNECT to
link-local address. From: frank
 1.25 05-Jun-2000  itojun backout change to in6_pcbnotify(). the change seems premature
(may cause trouble with advanced API in certain situation).
 1.24 05-Jun-2000  itojun pass struct proc * down to udp6_output and in6_pcbbind.
 1.23 03-Jun-2000  itojun sync with kame.
- use latest source address selection code - in6_src.c.
- correct frag header insertion.
- deep copy ip6 header portion in ip6_mloopback to avoid overwrite.
- do not bark when we forward packet to loopback.
- some cosmetics.
 1.22 29-May-2000  itojun disallow bind(2) with IPv4 mapped address for now. port number check is
insufficient at this moment and we can bind(2) two sockets listen on same
port number.

for real fix, we need to check inpcb table with in6pcb. we can't
find inpcb chain from particular in6pcb chain (like finding tcbtable from tcb6)
luckily RFC2553 does not talk about bind(2) behavior for IPv4 mapped.
IPv4 mapped brings in too much complexities...
 1.21 02-Mar-2000  itojun branches: 1.21.2;
bump kame revision id
 1.20 02-Mar-2000  itojun properly handle notifies from icmp6, so that we can properly reflect
redirects/unreach to transport layer. (sync with latest kame)
 1.19 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.18 03-Feb-2000  itojun use u_int16_t, not u_short, for port #.
 1.17 03-Feb-2000  itojun remove #if 0'ed code
 1.16 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.15 01-Feb-2000  thorpej Improve the readability of one small piece of code.
 1.14 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.13 26-Jan-2000  itojun make setsockopt(IPV6_PORTRANGE) work. obeys IPNOPRIVPORTS.
 1.12 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.11 06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.10 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.9 31-Jul-1999  itojun branches: 1.9.2; 1.9.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.8 17-Jul-1999  itojun fix faith interface support. need testing.
(i understand this is a dirty hack, of course)
 1.7 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.6 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.5 03-Jul-1999  thorpej RCS ID police.
 1.4 02-Jul-1999  itojun try to get a non-conflicting port # when bind(2) to port number 0
is called.
 1.3 02-Jul-1999  itojun expand insque/remque (quick hack). fundamental fix should be done
while clarifying relationship between inpcb and in6pcb.

PR: 7891
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_pcb.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_pcb.c was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.9.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.9.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.9.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.9.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.26.2.3 09-Sep-2003  msaitoh Pull up rev. 1.55 via patch (requested by itojun in ticket #66):
in6_pcbrtentry() now returns IPv4 rtentry if in6pcb is connected to IPv4
mapped address. Fixes PR 22431 from Andreas Gustafsson
 1.26.2.2 27-Aug-2000  itojun pullup (approved by releng-1-5)

> implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.

> cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3
> cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8
> cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h
> cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c
> cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c
> cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c
> cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
 1.26.2.1 03-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
repair kernel faithd(8) support. there were two mistakes:
(1) tcp6_input dropped packets for translation
(2) in6_pcblookup_connect was too strict
 1.35.2.10 11-Nov-2002  nathanw Catch up to -current
 1.35.2.9 17-Sep-2002  nathanw Catch up to -current.
 1.35.2.8 27-Aug-2002  nathanw Catch up to -current.
 1.35.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.35.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.35.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.35.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.35.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.35.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.35.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.38.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.38.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.38.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.38.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.38.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.38.2.1 03-Aug-2001  lukem update to -current
 1.42.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.46.6.4 10-Sep-2003  tron Pull up revision 1.55 via patch (requested by itojun in ticket #1405):
in6_pcbrtentry() now returns IPv4 rtentry if in6pcb is connected to IPv4 mapped
address. PR kern/22431 from Andreas Gustafsson
 1.46.6.3 15-Jun-2003  tron Pull up revision 1.53 (requested by itojun in ticket #1241):
include opt_inet.h -- found by David Laight
 1.46.6.2 21-Nov-2002  he Pull up revision 1.50 (requested by itojun in ticket #708):
Allow bind() of deprecated addresses, trusting userland
application knows what it's doing.
 1.46.6.1 27-Aug-2002  lukem Pull up revision 1.51 (requested by itojun in ticket #731):
pass proc * to in6_pcbsetport. PR 18073
 1.46.4.3 29-Aug-2002  gehenna catch up with -current.
 1.46.4.2 20-Jun-2002  gehenna catch up with -current.
 1.46.4.1 30-May-2002  gehenna Catch up with -current.
 1.53.6.6 11-Dec-2005  christos Sync with head.
 1.53.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.53.6.4 18-Dec-2004  skrll Sync with HEAD.
 1.53.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.53.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.53.6.1 03-Aug-2004  skrll Sync with HEAD
 1.61.2.1 28-Apr-2004  jmc Pullup rev 1.64 (requested by jonathan in ticket #201)

Fix per-PCB IPsec policy cache for FAST_IPSEC.
The sys/netipsec policy-cache assumes that policy-cacheable PCBs
always has a non-NULL inph_sp in the common PCB header. So we must
do all the per-PCB policy cache calls when either (KAME) IPSEC, or
FAST_IPSEC is defined.
 1.67.8.1 22-Nov-2005  yamt sync with head.
 1.67.2.8 24-Mar-2008  yamt sync with head.
 1.67.2.7 21-Jan-2008  yamt sync with head
 1.67.2.6 07-Dec-2007  yamt sync with head
 1.67.2.5 15-Nov-2007  yamt sync with head.
 1.67.2.4 03-Sep-2007  yamt sync with head.
 1.67.2.3 26-Feb-2007  yamt sync with head.
 1.67.2.2 30-Dec-2006  yamt sync with head.
 1.67.2.1 21-Jun-2006  yamt sync with head.
 1.68.2.1 01-Feb-2006  yamt sync with head.
 1.69.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.69.8.4 11-May-2006  elad sync with head
 1.69.8.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.69.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.69.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.69.6.2 11-Aug-2006  yamt sync with head
 1.69.6.1 24-May-2006  yamt sync with head.
 1.69.4.1 01-Jun-2006  kardel Sync with head.
 1.69.2.2 09-Sep-2006  rpaulo sync with head
 1.69.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.72.6.3 18-Dec-2006  yamt sync with head.
 1.72.6.2 10-Dec-2006  yamt sync with head.
 1.72.6.1 22-Oct-2006  yamt sync with head
 1.72.4.3 01-Feb-2007  ad Sync with head.
 1.72.4.2 12-Jan-2007  ad Sync with head.
 1.72.4.1 18-Nov-2006  ad Sync with head.
 1.82.2.4 07-May-2007  yamt sync with head.
 1.82.2.3 24-Mar-2007  yamt sync with head.
 1.82.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.82.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.84.2.3 20-Aug-2007  ad Sync with HEAD.
 1.84.2.2 08-Jun-2007  ad Sync with head.
 1.84.2.1 13-Mar-2007  ad Sync with head.
 1.85.2.1 11-Jul-2007  mjf Sync with head.
 1.87.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.88.14.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.88.14.1 19-Jul-2007  dyoung file in6_pcb.c was added on branch matt-mips64 on 2007-07-19 20:48:57 +0000
 1.88.12.4 18-Feb-2008  mjf Sync with HEAD.
 1.88.12.3 27-Dec-2007  mjf Sync with HEAD.
 1.88.12.2 08-Dec-2007  mjf Sync with HEAD.
 1.88.12.1 19-Nov-2007  mjf Sync with HEAD.
 1.88.10.2 22-Nov-2007  bouyer Sync with HEAD
 1.88.10.1 13-Nov-2007  bouyer Sync with HEAD
 1.88.6.2 23-Mar-2008  matt sync with HEAD
 1.88.6.1 09-Jan-2008  matt sync with HEAD
 1.88.4.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.88.4.1 11-Nov-2007  joerg Sync with HEAD.
 1.90.6.3 19-Jan-2008  bouyer Sync with HEAD
 1.90.6.2 10-Jan-2008  bouyer Sync with HEAD
 1.90.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.90.2.1 26-Dec-2007  ad Sync with head.
 1.94.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.94.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.94.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.94.2.1 24-Mar-2008  keiichi sync with head.
 1.96.2.1 18-May-2008  yamt sync with head.
 1.97.8.1 19-Oct-2008  haad Sync with HEAD.
 1.97.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.97.2.4 09-Oct-2010  yamt sync with head
 1.97.2.3 20-Jun-2009  yamt sync with head
 1.97.2.2 16-May-2009  yamt sync with head
 1.97.2.1 04-May-2009  yamt sync with head.
 1.99.8.2 23-Jul-2009  jym Sync with HEAD.
 1.99.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.99.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.110.4.2 31-May-2011  rmind sync with head
 1.110.4.1 05-Mar-2011  rmind sync with head
 1.110.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.112.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.116.6.2 05-Apr-2012  mrg sync to latest -current.
 1.116.6.1 18-Feb-2012  mrg merge to -current.
 1.116.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.116.2.2 30-Oct-2012  yamt sync with head
 1.116.2.1 17-Apr-2012  yamt sync with head
 1.121.2.3 03-Dec-2017  jdolecek update from HEAD
 1.121.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.121.2.1 23-Jun-2013  tls resync from head
 1.123.2.3 18-May-2014  rmind sync with head
 1.123.2.2 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.123.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.124.2.1 10-Aug-2014  tls Rebase.
 1.128.2.3 28-Sep-2016  bouyer Pull up following revision(s) (requested by roy in ticket #1243):
sys/netinet6/raw_ip6.c: revision 1.150 via patch
sys/netinet6/in6_pcb.c: revision 1.149 via patch
Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.128.2.2 17-Jan-2015  martin branches: 1.128.2.2.4;
Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.128.2.1 08-Sep-2014  msaitoh Pull up following revision(s) (requested by rmind in ticket #80):
sys/netinet6/in6_pcb.c: revision 1.129
sys/netinet/in_pcb.c: revision 1.152
in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.
This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.128.2.2.4.1 18-Jan-2017  skrll Sync with netbsd-5
 1.134.2.8 28-Aug-2017  skrll Sync with HEAD
 1.134.2.7 05-Feb-2017  skrll Sync with HEAD
 1.134.2.6 05-Dec-2016  skrll Sync with HEAD
 1.134.2.5 05-Oct-2016  skrll Sync with HEAD
 1.134.2.4 09-Jul-2016  skrll Sync with HEAD
 1.134.2.3 22-Sep-2015  skrll Sync with HEAD
 1.134.2.2 06-Jun-2015  skrll Sync with HEAD
 1.134.2.1 06-Apr-2015  skrll Sync with HEAD
 1.145.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.145.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.145.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.145.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.145.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.145.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.155.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.161.4.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #463):
sys/netinet/in.c: revision 1.212
sys/netinet/ip_output.c: revision 1.288
sys/netinet6/in6.c: revision 1.256
sys/netinet6/in6_pcb.c: revision 1.163
sys/sys/lwp.h: revision 1.176
Add missing curlwp_bindx
--
Add missing curlwp_bindx
--
Check LP_BOUND is surely set in curlwp_bindx
This may find an extra call of curlwp_bindx.
--
Fix usage of curlwp_bind in ip_output
curlwp_bindx must be called in LIFO order, i.e., we can't call curlwp_bind
and curlwp_bindx like this:
bound1 = curlwp_bind();
bound2 = curlwp_bind();
curlwp_bindx(bound1);
curlwp_bindx(bound2);
ip_outout did so if NET_MPSAFE. Fix it.
--
Fix wrong usage of psref_held
We can't use it for checking if a caller does NOT hold a given target.
If you want to do it you should have psref_not_held or something.
 1.161.4.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.165.4.1 10-Jun-2019  christos Sync with HEAD
 1.54 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.53 15-Jun-2022  knakahara in6p_hash isn't used, either.
 1.52 08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.51 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.50 22-Nov-2018  knakahara Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ.

Add ATF later.
 1.49 02-Mar-2017  ozaki-r branches: 1.49.12; 1.49.14;
Make sure im6o_memberships is protected by in6p's lock (solock)
 1.48 22-Feb-2017  ozaki-r Add assertions and comments for lock states of socket and pcb
 1.47 08-Dec-2016  ozaki-r branches: 1.47.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.46 24-May-2015  rtr branches: 1.46.2;
remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
 1.45 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.44 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.43 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.42 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.41 30-Mar-2015  ozaki-r Include ip6.h for ip6_hdr
 1.40 11-Oct-2014  christos branches: 1.40.2;
Make IPV4 mapped addresses able to do IPV4 multicast. Fixes needed:

- allow binding to mapped v4 multicast addresses
- define v4moptions, allow setting it via ioctl, pass it to ip_output,
free it when killing the pcb.

Ideally we would allow the IPV6 multicast setsockopts work on mapped addresses
too, but this is a lot more work and linux does not do it either.
 1.39 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.38 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.37 25-Jun-2012  christos branches: 1.37.2; 1.37.4; 1.37.12;
rename rfc6056 -> portalgo, requested by yamt
 1.36 24-Sep-2011  christos branches: 1.36.2;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.35 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.34 30-Apr-2009  elad branches: 1.34.4; 1.34.6;
- Make in6_pcbbind_{addr,port}() static

- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()

- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
so that we have a more complete context

- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()

- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
not a copy of it, pointed out by bouyer@, thanks!

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
 1.33 20-Apr-2009  elad Extract in6_pcbbind()'s guts into two new routines: in6_pcbbind_addr() and
in6_pcbbind_port(), used for binding to an address and a port respectively.

While here, fix a possible "leak" of an in6pcb when binding to an address
succeeded but binding to an auto-assigned port failed.

Proposed and received no objections on tech-net@:

http://mail-index.netbsd.org/tech-net/2009/04/15/msg001223.html
 1.32 02-May-2007  dyoung branches: 1.32.32; 1.32.42; 1.32.48;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.31 17-Feb-2007  dyoung branches: 1.31.4; 1.31.6;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.30 23-Jul-2006  ad branches: 1.30.10;
Use the LWP cached credentials where sane.
 1.29 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.28 26-Jan-2006  rpaulo branches: 1.28.2; 1.28.4; 1.28.6; 1.28.8; 1.28.10;
de-__P()
 1.27 15-Nov-2005  dsl branches: 1.27.2;
Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.26 29-May-2005  christos branches: 1.26.2; 1.26.8;
- avoid shadowed variables
- sprinkle const.
 1.25 11-Jun-2004  itojun implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.24 04-Sep-2003  itojun branches: 1.24.2;
revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.23 25-Aug-2003  itojun g/c unused member. use in6p_ip6 more effectively.
 1.22 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.21 26-Aug-2002  itojun branches: 1.21.6;
pass proc * to in6_pcbsetport. PR 18073
 1.20 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.19 24-Oct-2001  itojun branches: 1.19.8; 1.19.10;
more whitespace sync with kame
 1.18 15-Oct-2001  itojun branches: 1.18.2;
implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.17 02-Jul-2001  itojun branches: 1.17.2;
on interface removal, remove multicast groups joined from pcb, before
removing interface addresses. without the change, we may deref
NULL pointer in in_pcbpurgeif(). from jinmei@kame, sync with kame
 1.16 11-Feb-2001  itojun branches: 1.16.2;
wrap kernel-only #define (kame cross-bsd portability) into _KERNEL.
 1.15 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.14 08-Feb-2001  itojun amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync with kame
 1.13 19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.12 07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.11 05-Jun-2000  itojun pass struct proc * down to udp6_output and in6_pcbbind.
 1.10 03-Feb-2000  itojun branches: 1.10.2;
use u_int16_t, not u_short, for port #.
 1.9 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.8 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.7 27-Dec-1999  itojun synchronize in6pcb flags definition across kame/*bsd.
this would help us implement future COMPAT_{FREE,OPEN}BSD{,I}.

(sync with kame)
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 22-Jul-1999  itojun branches: 1.5.2; 1.5.8;
change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.4 17-Jul-1999  itojun fix faith interface support. need testing.
(i understand this is a dirty hack, of course)
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_pcb.h was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_pcb.h was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.16.2.5 27-Aug-2002  nathanw Catch up to -current.
 1.16.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.16.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.16.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.17.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.10.2 14-Jun-2004  jmc Pullup rev 1.25 (requested by itojun in ticket #1709)

Implement IPV6_USE_MIN_MTU sockopt.
 1.19.10.1 27-Aug-2002  lukem Pull up revision 1.21 (requested by itojun in ticket #731):
pass proc * to in6_pcbsetport. PR 18073
 1.19.8.2 29-Aug-2002  gehenna catch up with -current.
 1.19.8.1 20-Jun-2002  gehenna catch up with -current.
 1.21.6.5 11-Dec-2005  christos Sync with head.
 1.21.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.21.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.21.6.1 03-Aug-2004  skrll Sync with HEAD
 1.24.2.1 14-Jun-2004  tron Pull up revision 1.25 (requested by itojun in ticket #468):
implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.26.8.1 22-Nov-2005  yamt sync with head.
 1.26.2.4 03-Sep-2007  yamt sync with head.
 1.26.2.3 26-Feb-2007  yamt sync with head.
 1.26.2.2 30-Dec-2006  yamt sync with head.
 1.26.2.1 21-Jun-2006  yamt sync with head.
 1.27.2.1 01-Feb-2006  yamt sync with head.
 1.28.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.28.8.1 11-May-2006  elad sync with head
 1.28.6.2 11-Aug-2006  yamt sync with head
 1.28.6.1 24-May-2006  yamt sync with head.
 1.28.4.1 01-Jun-2006  kardel Sync with head.
 1.28.2.1 01-Feb-2006  rpaulo Merge in6pcb with inpcb and remove inpcb_hdr since that's no longer needed.
 1.30.10.2 07-May-2007  yamt sync with head.
 1.30.10.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.31.6.1 11-Jul-2007  mjf Sync with head.
 1.31.4.1 08-Jun-2007  ad Sync with head.
 1.32.48.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.32.42.1 28-Apr-2009  skrll Sync with HEAD.
 1.32.32.1 04-May-2009  yamt sync with head.
 1.34.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.34.4.1 31-May-2011  rmind sync with head
 1.36.2.1 30-Oct-2012  yamt sync with head
 1.37.12.1 10-Aug-2014  tls Rebase.
 1.37.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.37.2.1 03-Dec-2017  jdolecek update from HEAD
 1.40.2.4 28-Aug-2017  skrll Sync with HEAD
 1.40.2.3 05-Feb-2017  skrll Sync with HEAD
 1.40.2.2 06-Jun-2015  skrll Sync with HEAD
 1.40.2.1 06-Apr-2015  skrll Sync with HEAD
 1.46.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.46.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.47.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.49.14.1 10-Jun-2019  christos Sync with HEAD
 1.49.12.1 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.17 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.16 13-Nov-2001  lukem branches: 1.16.8;
add RCSIDs
 1.15 25-Mar-2001  itojun branches: 1.15.2;
couple of missing splx. sync with kame.
From: csapuntz@play-doh.stanford.edu (Constantine Sapuntzakis)
 1.14 10-Feb-2001  itojun branches: 1.14.2;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.13 28-Dec-2000  itojun do not touch ra_addr if it is NULL. from IIJ SEIL team
 1.12 07-Jun-2000  itojun branches: 1.12.2;
fix anycast address determination.
correct interface address addition when link-local is added (check if ifp
matches).
make diff to kame repository easier (breaks some KNF)

sync with kame.
 1.11 23-Mar-2000  thorpej branches: 1.11.2;
New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.10 09-Feb-2000  itojun honor ifa reference counting.
 1.9 07-Feb-2000  itojun remove IPv6 router renumbering prefix information in the kernel
when all the interface addresses are gone.
this should remove dangling structure when:
# ifconfig lo0 inet6 3ffe::1 prefixlen 64 alias
# ifconfig lo0 inet6 3ffe::1 -alias
is performed.
 1.8 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.7 04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.6 03-Feb-2000  itojun s/splnet/splsoftnet/
 1.5 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.4 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.3 03-Jul-1999  thorpej branches: 1.3.2; 1.3.8;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_prefix.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_prefix.c was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.3.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.3.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.12.2.1 25-Jan-2001  jhawk Pull up revision 1.13 (requested by itojun):
Don't dereference null ra_addr pointer.
 1.14.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.14.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.14.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.14.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.15.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.15.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.8.1 20-Jun-2002  gehenna catch up with -current.
 1.6 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.5 10-Feb-2001  itojun branches: 1.5.2; 1.5.4; 1.5.16;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.4 23-Mar-2000  thorpej branches: 1.4.6;
New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.3 04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.2 13-Dec-1999  itojun branches: 1.2.2;
sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.1 30-Nov-1999  itojun branches: 1.1.2;
file in6_prefix.h was initially added on branch kame.
 1.1.2.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.6.3 11-Feb-2001  bouyer Sync with HEAD.
 1.4.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.6.1 23-Mar-2000  bouyer file in6_prefix.h was added on branch thorpej_scsipi on 2000-11-20 18:10:50 +0000
 1.5.16.1 20-Jun-2002  gehenna catch up with -current.
 1.5.4.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.5.2.1 20-Jun-2002  nathanw Catch up to -current.
 1.1 02-Dec-2014  christos branches: 1.1.2; 1.1.18;
add routines to print in6_addr and sockaddr_in6 (in6_print, sin6_print)
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 02-Dec-2014  jdolecek file in6_print.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 02-Dec-2014  skrll file in6_print.c was added on branch nick-nhusb on 2015-04-06 15:18:23 +0000
 1.131 09-Feb-2024  andvar fix spelling mistakes, mainly in comments and log messages.
 1.130 24-Oct-2022  knakahara Fix PR kern/57037

Be able to change the behavior sending parameter changing routing messages.
When set net.inet6.ip6.param_rt_msg=0, don't send parameter changing
routing messages.
When set net.inet6.ip6.param_rt_msg=1(default), send parameter changing
routing messages by RTM_NEWADDR.
 1.129 03-Sep-2022  thorpej Garbage-collect everything related to struct domain::dom_ifqueues
(except dom_ifqueues itself, until the next kernel version bump).
It's no longer used now that nothing uses the legacy netisr mechanism.
 1.128 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.127 24-Apr-2020  jakllsch Fill in .pr_usrreqs for SOCK_SEQPACKET and SOCK_STREAM variants of SCTP too.

This should allow these socket types of SCTP to operate on IPv6 family
sockets, as .pr_usrreqs must not be NULL for socreate() to succeed.
 1.126 14-Aug-2018  maxv branches: 1.126.10;
Retire EtherIP, we have L2TP instead.
 1.125 11-May-2018  roy branches: 1.125.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.
 1.124 03-May-2018  maxv Remove now unused tcpip.h includes. Some were already unused before.
 1.123 03-May-2018  maxv Remove net_osdep.h completely.
 1.122 15-Mar-2018  maxv Add the PR_LASTHDR flag on the PFsync and CARP entries. Otherwise a
"require" IPsec policy is not enforced on them, and unauthenticated
packets will be accepted.

Tested with a require-AH configuration. Sent on tech-net@, no comment.
 1.121 07-Feb-2018  maxv branches: 1.121.2;
Style, and localify IPV6FORWARDING. No functional change.
 1.120 07-Feb-2018  maxv Change ip6_hdrnestlimit to be 15 instead of 50. I couldn't find any
reference in RFCs about what a correct limit should be, but FreeBSD already
uses 15.

If an IPv6 packet has 50 options, there is clearly something wrong with it.
 1.119 27-Sep-2017  ozaki-r Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
 1.118 21-Sep-2017  ozaki-r Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
 1.117 14-Apr-2017  ozaki-r branches: 1.117.4;
Rumpify netipsec

Note that we should modularize netipsec and reduce reverse symbol references
(referencing symbols of netipsec from net, netinet and netinet6) though,
the task needs lots of code changes. Prior to doing so, rumpifying it and
having ATF tests should be useful.
 1.116 16-Feb-2017  knakahara add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.115 13-Feb-2017  ozaki-r Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.
 1.114 13-Dec-2016  ozaki-r branches: 1.114.2;
Remove unnecessary inclusions of nd6.h
 1.113 06-Jul-2016  ozaki-r branches: 1.113.2;
Move in6_ifaddr_list to a more proper place (from ip6_input.c to in6.c)

It's a similar place as the IPv4 address list, i.e., in.c.

More varibles will join together.
 1.112 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.111 11-Apr-2016  ozaki-r Sweep unncessary radix.h inclusions
 1.110 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.109 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.108 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.107 13-Oct-2015  rjs Add core networking support for SCTP.
 1.106 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.105 22-Apr-2015  roy Move INET6 specific in6_if_{up,down}() and in6_if_link_{up,down}()
into agnostic domain functions.
 1.104 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.103 05-Jun-2014  rmind branches: 1.103.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.102 22-May-2014  rmind Move udp6_input(), udp6_sendup(), udp6_realinput() and udp6_input_checksum()
from udp_usrreq.c to udp6_usrreq.c where they belong. No functional change.
 1.101 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.100 02-Jan-2014  pooka branches: 1.100.2;
Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.99 05-Jun-2013  christos branches: 1.99.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.98 01-Mar-2013  joerg Retire OSI network stack. OK core@
 1.97 23-Jun-2012  christos branches: 1.97.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.96 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.95 31-Dec-2011  christos branches: 1.95.2; 1.95.6; 1.95.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.94 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.93 24-Sep-2011  christos branches: 1.93.2; 1.93.6;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.92 24-May-2011  spz RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.91 03-May-2011  dyoung *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.90 31-Mar-2011  dyoung Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.89 24-Aug-2010  jakllsch branches: 1.89.2;
Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.88 04-Feb-2010  joerg branches: 1.88.2; 1.88.4;
Explicitly include opt_gateway.h when depending on GATEWAY.
 1.87 11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.86 11-Sep-2009  dyoung Nothing uses sockaddr_in6_cmp() right now, and the generic
sockaddr_cmp() is probably as fast or faster than calling
sockaddr_in6_cmp() through a function pointer, so let's stop
compiling it.
 1.85 21-Aug-2009  tsutsui Fix error on kernels with options IPSEC without options IPSEC_ESP.
Found on building evbppc/conf/PMPPC.
 1.84 23-Mar-2009  liamjfoy Init ip6flow pool dynamically instead of using a linkset.
 1.83 25-Nov-2008  pooka branches: 1.83.4;
Make dom_maxrtkey of inet/inet6domain the size of the ip_encap pack
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
 1.82 24-Apr-2008  ad branches: 1.82.2; 1.82.8; 1.82.10; 1.82.12;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.81 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.80 15-Apr-2008  thorpej branches: 1.80.2;
Make pim6 stats per-cpu.
 1.79 19-Sep-2007  dyoung branches: 1.79.16; 1.79.20;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.78 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.77 06-May-2007  dyoung branches: 1.77.2; 1.77.6; 1.77.8;
In AppleTalk, IPv4, and IPv6 routing domains, help sockaddr_cmp()
avoid an indirect function call by comparing the family, length,
and bytes [dom->dom_sa_cmpofs, dom->dom_sa_cmpofs + dom->dom_sa_cmplen),
corresponding to the the sockaddrs' "address" members.

For ISO, actually use sockaddr_iso_cmp, for a change. Thanks to
yamt@ for pointing out my error.
 1.76 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.75 07-Mar-2007  liamjfoy branches: 1.75.2; 1.75.4;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.74 06-Mar-2007  liamjfoy Fix some style issues - no functional change
 1.73 27-Feb-2007  degroote Initialize fast_ipsec entry in the protocol switch with structure
initializers as other entries.
 1.72 19-Feb-2007  dyoung Initialize protocol switch with structure initializers.
 1.71 17-Feb-2007  dyoung 0 -> NULL
 1.70 10-Feb-2007  degroote branches: 1.70.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.69 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.68 23-Nov-2006  rpaulo branches: 1.68.2; 1.68.4;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.67 10-Oct-2006  dogcow change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)
 1.66 30-Aug-2006  christos branches: 1.66.2; 1.66.4;
add missing initializers
 1.65 28-Aug-2006  christos remove extra members
 1.64 25-Aug-2006  matt One step closer to loadable domains. Store pointers to a domain's soft
interrupt queues so if_detach can remove packets to removed interfaces from
them. This eliminates a lot of conditional ugly code in if.c
 1.63 18-May-2006  liamjfoy Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.62 05-Mar-2006  rpaulo branches: 1.62.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.61 11-Dec-2005  christos branches: 1.61.4; 1.61.6; 1.61.8;
merge ktrace-lwp.
 1.60 19-Jul-2005  gdt Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.59 29-May-2005  christos branches: 1.59.2;
- avoid shadowed variables
- sprinkle const.
 1.58 23-Jan-2005  matt branches: 1.58.6;
Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.57 22-Apr-2004  matt branches: 1.57.4;
Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.56 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.55 03-Nov-2003  briggs Revert the change in default value of ipv6_v6only. Further discussion
on this topic is required. It should be reintroduced and pursued in
the IETF.
 1.54 28-Oct-2003  briggs Toggle the default value of ip6_v6only. Also provide a sample sysctl to
retain the existing behavior.
 1.53 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.52 05-Sep-2003  itojun call tcp_drain() if IPv4-less kernel
 1.51 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.50 14-Aug-2003  itojun enforce ipsec policy on raw wildcard.
 1.49 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.48 07-Aug-2003  itojun make net.inet6.ip6.redirect actually work. from Tomoyuki Sahara via kame
 1.47 17-Apr-2003  thorpej branches: 1.47.2;
Protect the definition of offsetof().
 1.46 11-Nov-2002  itojun pmtu_probe is not used anywhere (it is used in KAME TCP6-only code).
From: Krister Walfridsson <cato@df.lth.se>
 1.45 20-Aug-2002  itojun sync up use_deprecated handling with latest kame.
- bind(deprecated) is allowed, trusting userland app is doing the right thing
- use_deprecated default to 1
 1.44 17-Aug-2002  itojun set default value for use_deprecated to 0, to avoid consequences with ftpd.
 1.43 09-Jun-2002  itojun whitespace cleanup
 1.42 08-Jun-2002  itojun whitespace cleanup
 1.41 29-May-2002  itojun move per-interface ip6/icmp6 stat to ifnet->if_afdata. sync w/kame
 1.40 28-May-2002  itojun limit number of IPv6 fragments (not the fragment queue size) to
fight against lots-of-frags DoS attacks. sync w/kame
 1.39 15-Mar-2002  itojun branches: 1.39.4; 1.39.6;
have tcp6_drain
 1.38 21-Dec-2001  itojun call encap6_ctlinput on icmp6 against tunnelled packet. sync w/kame
 1.37 21-Dec-2001  itojun use radix table for inbound tunnel lookup (would increase performance
for machines with a lot of tunnels).
update route cache for IPvX-over-IPv6 tunnel on path MTU discovery.
snyc with kame
 1.36 21-Dec-2001  itojun move in6_gif_hlim decl to in6_gif.c. sync with kame
 1.35 21-Dec-2001  itojun move protosw fragment for gif/stf to their own source code.
reduce #ifdef in stf code. sync with kame
 1.34 13-Nov-2001  lukem add RCSIDs
 1.33 24-Oct-2001  itojun no tcp_fasttimo any more. PR 14333
 1.32 24-Oct-2001  itojun more whitespace sync with kame
 1.31 16-Oct-2001  itojun branches: 1.31.2;
remove unused #define. sync whitespace/comment with kame.
 1.30 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.29 21-Mar-2001  thorpej branches: 1.29.2;
Add a protosw flag, PR_ABRTACPTDIS (Abort on Accept of Disconnected
Socket), and add it to the protocols that use that behavior (all
PR_LISTEN protocols except for PF_LOCAL stream sockets).
 1.28 01-Mar-2001  itojun branches: 1.28.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.27 21-Feb-2001  itojun need PR_ADDR|PR_ATOMIC for IPPROTO_EON. fix typo. from chopps, sync with kame
 1.26 20-Feb-2001  itojun ISO over IPv4/v6 by EON encapsulation. from chopps, sync with kame.
 1.25 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.24 11-Feb-2001  itojun whitespace sync with kame
 1.23 19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.22 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.21 10-Oct-2000  itojun sync with kame ($KAME$)
 1.20 10-Oct-2000  enami Don't initialize TCP twice on v4/v6 dual stack kernel.
 1.19 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.18 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.17 19-Apr-2000  itojun branches: 1.17.4;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.16 26-Feb-2000  itojun implement rip6_ctlinput, to cope with routing changes correctly.
(IMHO we need rip_ctlinput as well)
 1.15 26-Feb-2000  itojun make it possible to throw IPv6 packet with proto=4/41.
(in normal case we don't do it, but this is how IPv4 in_proto is written)
 1.14 14-Feb-2000  thorpej Use ratecheck() for ICMP6 rate limiting.
 1.13 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.12 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.11 06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.10 02-Jan-2000  itojun add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)
 1.9 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.8 31-Jul-1999  itojun branches: 1.8.2; 1.8.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.7 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.6 27-Jul-1999  explorer Fix a problem where tcp_slowtimo was called twice, once for ipv4 tcp and
once for ipv6. This patch makes the ipv6 case pass NULLs in for fast
and slow timeouts iff defined(INET) and passes in the right function
if !defined(INET).

Reveiwed by itojun@iijlab.net.
 1.5 22-Jul-1999  itojun change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.4 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_proto.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_proto.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.8.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.8.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.8.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.4 09-Sep-2003  msaitoh Pull up rev. 1.50 (requested by itojun in ticket #50):
enforce ipsec policy on raw wildcard.
 1.17.4.3 11-Mar-2001  he Pull up revision 1.28 (via patch, requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.17.4.2 17-Oct-2000  tv Pullup 1.20 [enami]:
Don't initialize TCP twice on v4/v6 dual stack kernel.
 1.17.4.1 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.28.2.7 11-Dec-2002  thorpej Sync with HEAD.
 1.28.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.28.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.28.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.28.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.28.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.28.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.29.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.29.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.29.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.31.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.39.6.3 04-Oct-2003  tron Pull up revision 1.50 (requested by itojun in ticket #1409):
enforce ipsec policy on raw wildcard.
 1.39.6.2 21-Nov-2002  he Pull up revision 1.45 (requested by itojun in ticket #708):
Allow bind() of deprecated addresses, trusting userland
application knows what it's doing.
 1.39.6.1 18-Aug-2002  lukem Pull up revision 1.44 (requested by itojun in ticket #697):
set default value for use_deprecated to 0, to avoid consequences with ftpd.
 1.39.4.3 29-Aug-2002  gehenna catch up with -current.
 1.39.4.2 20-Jun-2002  gehenna catch up with -current.
 1.39.4.1 30-May-2002  gehenna Catch up with -current.
 1.47.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.47.2.4 24-Jan-2005  skrll Sync with HEAD.
 1.47.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.47.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.47.2.1 03-Aug-2004  skrll Sync with HEAD
 1.57.4.1 29-Apr-2005  kent sync with -current
 1.58.6.1 15-Aug-2005  tron Pull up revision 1.60 (requested by gdt in ticket #661):
Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.59.2.5 27-Oct-2007  yamt sync with head.
 1.59.2.4 03-Sep-2007  yamt sync with head.
 1.59.2.3 26-Feb-2007  yamt sync with head.
 1.59.2.2 30-Dec-2006  yamt sync with head.
 1.59.2.1 21-Jun-2006  yamt sync with head.
 1.61.8.3 03-Sep-2006  yamt sync with head.
 1.61.8.2 24-May-2006  yamt sync with head.
 1.61.8.1 13-Mar-2006  yamt sync with head.
 1.61.6.2 01-Jun-2006  kardel Sync with head.
 1.61.6.1 22-Apr-2006  simonb Sync with head.
 1.61.4.2 09-Sep-2006  rpaulo sync with head
 1.61.4.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.62.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.66.4.2 10-Dec-2006  yamt sync with head.
 1.66.4.1 22-Oct-2006  yamt sync with head
 1.66.2.2 12-Jan-2007  ad Sync with head.
 1.66.2.1 18-Nov-2006  ad Sync with head.
 1.68.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.68.2.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.70.2.3 07-May-2007  yamt sync with head.
 1.70.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.70.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.75.4.1 11-Jul-2007  mjf Sync with head.
 1.75.2.2 09-Oct-2007  ad Sync with head.
 1.75.2.1 08-Jun-2007  ad Sync with head.
 1.77.8.1 06-Nov-2007  matt sync with HEAD
 1.77.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.77.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.77.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.79.20.2 17-Jan-2009  mjf Sync with HEAD.
 1.79.20.1 02-Jun-2008  mjf Sync with HEAD.
 1.79.16.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.80.2.1 18-May-2008  yamt sync with head.
 1.82.12.1 21-Nov-2010  riz branches: 1.82.12.1.2;
Pull up following revision(s) (requested by jakllsch in ticket #1445):
sys/netinet6/ip6_etherip.h: revision 1.2
sys/netinet6/in6_proto.c: revision 1.89
sys/netinet6/ip6_etherip.c: revision 1.14
Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.82.12.1.2.1 07-Jan-2011  matt If using hardware checksum offload and the packet can't be h/w checksumed
(for whatever reason, some hardware is stupid) allow the driver to calculate
the checksum instead.
 1.82.10.2 28-Apr-2009  skrll Sync with HEAD.
 1.82.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.82.8.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.82.2.4 09-Oct-2010  yamt sync with head
 1.82.2.3 11-Mar-2010  yamt sync with head
 1.82.2.2 16-Sep-2009  yamt sync with head
 1.82.2.1 04-May-2009  yamt sync with head.
 1.83.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.88.4.3 31-May-2011  rmind sync with head
 1.88.4.2 21-Apr-2011  rmind sync with head
 1.88.4.1 05-Mar-2011  rmind sync with head
 1.88.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.89.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.93.6.2 05-Apr-2012  mrg sync to latest -current.
 1.93.6.1 18-Feb-2012  mrg merge to -current.
 1.93.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.93.2.2 30-Oct-2012  yamt sync with head
 1.93.2.1 17-Apr-2012  yamt sync with head
 1.95.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.95.6.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.95.2.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.97.2.3 03-Dec-2017  jdolecek update from HEAD
 1.97.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.97.2.1 23-Jun-2013  tls resync from head
 1.99.2.2 18-May-2014  rmind sync with head
 1.99.2.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.100.2.1 10-Aug-2014  tls Rebase.
 1.103.4.10 28-Aug-2017  skrll Sync with HEAD
 1.103.4.9 05-Feb-2017  skrll Sync with HEAD
 1.103.4.8 09-Jul-2016  skrll Sync with HEAD
 1.103.4.7 29-May-2016  skrll Sync with HEAD
 1.103.4.6 22-Apr-2016  skrll Sync with HEAD
 1.103.4.5 19-Mar-2016  skrll Sync with HEAD
 1.103.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.103.4.3 22-Sep-2015  skrll Sync with HEAD
 1.103.4.2 06-Jun-2015  skrll Sync with HEAD
 1.103.4.1 06-Apr-2015  skrll Sync with HEAD
 1.113.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.113.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.113.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.114.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.117.4.5 12-May-2018  martin Pull up following revision(s) (requested by roy in ticket #821):

sys/netinet6/in6_proto.c: revision 1.125
sys/net/raw_cb.h: revision 1.29
sys/kern/uipc_usrreq.c: revision 1.186

Increase the default size of some receive buffers from 8k to 16k.

This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.
 1.117.4.4 31-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #676):

sys/netinet/in_proto.c: revision 1.127
sys/netinet6/in6_proto.c: revision 1.122

Add the PR_LASTHDR flag on the PFsync and CARP entries. Otherwise a
"require" IPsec policy is not enforced on them, and unauthenticated
packets will be accepted.

Tested with a require-AH configuration. Sent on tech-net@, no comment.
 1.117.4.3 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #672):
sys/netinet6/in6_proto.c: revision 1.120
Change ip6_hdrnestlimit to be 15 instead of 50. I couldn't find any
reference in RFCs about what a correct limit should be, but FreeBSD already
uses 15.
If an IPv6 packet has 50 options, there is clearly something wrong with it.
 1.117.4.2 24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #305):
distrib/sets/lists/tests/mi: revision 1.762
sys/net/route.c: revision 1.198-1.201
sys/net/route.h: revision 1.114
sys/netatalk/at_proto.c: revision 1.22
sys/netinet/in_proto.c: revision 1.124
sys/netinet6/in6_proto.c: revision 1.118
sys/netmpls/mpls_proto.c: revision 1.31
sys/netnatm/natm_proto.c: revision 1.18
sys/rump/net/lib/libsockin/sockin.c: revision 1.65
sys/sys/domain.h: revision 1.33
tests/net/route/Makefile: revision 1.6
tests/net/route/t_rtcache.sh: revision 1.1
Add tests of rtcache invalidation
Remove unnecessary NULL check of rt_ifp
It's always non-NULL.
Invalidate rtcache based on a global generation counter
The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.
One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.
This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
Remove the global lock for rtcache
Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
Synchronize on rtcache_generation with rtlock
It's racy if NET_MPSAFE is enabled.
Pointed out by joerg@
 1.117.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.121.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.121.2.2 21-May-2018  pgoyette Sync with HEAD
 1.121.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.125.2.1 10-Jun-2019  christos Sync with HEAD
 1.126.10.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.92 03-Aug-2023  ozaki-r in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.
 1.91 04-Nov-2022  ozaki-r branches: 1.91.2;
inpcb: rename functions to in6pcb_*
 1.90 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.89 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.88 10-Aug-2021  kardel PR kern/56348
MTU discovery fails with IPv6 sockets bound to IPv4 mapped address

pick up the IPv4 route for IPv4 mapped IPv6 address to get the correct
MTU and not any unrelated/inappropriate MTU from IPv6 routes. IPv4 mapped
IPv6 addresses are always handled by the IPv4 stack and MTU discovery
is solely handled with the IPv4 routing table.
 1.87 28-Aug-2020  ozaki-r inet6: reduce silent packet discards
 1.86 13-Nov-2019  ozaki-r Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.85 01-May-2018  maxv branches: 1.85.2; 1.85.6;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.84 06-Dec-2017  roy branches: 1.84.2;
Treat unvalidated addresses as deprecated in rule 3.
 1.83 24-Nov-2017  roy Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
 1.82 20-Nov-2017  ozaki-r Mention IPv6 address selection policy isn't MP-safe yet

Though it's not a problem until a policy is set.
 1.81 17-Sep-2017  christos Skip the scope test for loopback addresses in non-loopback interfaces.
While this test is also done in in6_setscope, testing here allows us
to log an error for other callers.
 1.80 27-Aug-2017  christos PR/52382: BERTRAND Joel: Fix mapped IPv4 source selection; this got broken
in the last code refactoring. in6_selectif failing is not fatal.
XXX: pullup-8
 1.79 17-Feb-2017  ozaki-r branches: 1.79.6;
Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.
 1.78 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.77 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.76 08-Dec-2016  ozaki-r branches: 1.76.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.75 02-Dec-2016  ozaki-r CID 1396598, CID 1396634: Fix null pointer dereferences
 1.74 10-Nov-2016  ozaki-r Tidy up in6_select*

This change tidies up in6_select* functions, especially
selectroute.

selectroute is annoying because:
- It returns both/either of a rtentry and/or an ifp
- Yes, it may return only an ifp!
- It is valid but selectroute shouldn't handle the case
- Such conditional behavior makes it difficult
to apply locking/psref thingy
- It may return a rtentry even if error
- It may use opt->ip6po_nextroute rtcache implicitly
- The caller can know if it is used
by rtcache_validate(&opt->ip6po_nextroute)
but it's racy in MP-safe world
- Even if it uses opt->ip6po_nextroute, it may
return a rtentry that isn't derived from the rtcache

The change includes:
- Rename selectroute to in6_selectroute
- Let a remaining caller of selectroute, in6_selectif,
use in6_selectroute instead
- Let in6_selectroute return only an rtentry
- If error, it doesn't return an rtentry
- A caller gets an ifp from a returned rtentry
- Allow in6_selectroute to modify a passed rtcache
and a caller can know if opt->ip6po_nextroute is
used via the rtcache
- Let callers (ip6_output and in6_selectif) handle
the case that only an ifp is required

Inspired by OpenBSD
Proposed on tech-kern and tech-net
LGTM by roy@
 1.73 31-Oct-2016  ozaki-r Pull best address selection code out of in6_selectsrc

No functional change.
 1.72 31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.71 31-Oct-2016  ozaki-r Remove unnecessary NULL checks
 1.70 26-Aug-2016  roy Simplify.
 1.69 26-Aug-2016  roy Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.68 23-Aug-2016  roy White space police.
 1.67 23-Aug-2016  roy Sync denied flags.
 1.66 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.65 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.64 15-Jul-2016  ozaki-r Use ifatoia6 macro

No functional change.
 1.63 04-Jul-2016  ozaki-r branches: 1.63.2;
Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.62 21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.61 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.60 18-May-2016  ozaki-r Get rid of unnecessary NULL check

It's already checked just some lines above.
 1.59 12-Dec-2015  christos Hook up the addrctl stuff that's already there.
 1.58 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.57 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.56 20-Jan-2015  roy Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.55 05-Sep-2014  matt branches: 1.55.2;
Don't use C++ keyword as variable.
Use different prefix for nd6_prefixctl members than for nd6_prefix members.
 1.54 17-May-2014  rmind branches: 1.54.2;
Replace open-coded access (and boundary checking) of ifindex2ifnet with
if_byindex() function.
 1.53 25-Jun-2012  christos branches: 1.53.2; 1.53.4; 1.53.12;
rename rfc6056 -> portalgo, requested by yamt
 1.52 24-Sep-2011  christos branches: 1.52.2;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.51 17-May-2011  dholland Add missing $NetBSD$ header.
 1.50 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.49 25-May-2009  pooka branches: 1.49.4; 1.49.6;
Remove declaration of unused extern struct ifnet loif[NLOOP], which
was already removed once, but brought back in a wholesale import.
While here, mop up the #ifdef __SomeotherOS__ noise.
 1.48 12-May-2009  elad Implicit EPERM -> explicit EACCES.

Requested by ad@ and yamt@.
 1.47 30-Apr-2009  elad Commit changes to netinet6/in6_src.c, forgot in previous commit:

http://mail-index.netbsd.org/source-changes/2009/04/30/msg220547.html

Make in_pcbsetport() set the port number selected before passing "sin" to
kauth(9).
 1.46 18-Mar-2009  cegger bzero -> memset
 1.45 11-Jan-2009  christos branches: 1.45.2;
merge christos-time_t
 1.44 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.43 15-Apr-2008  thorpej branches: 1.43.4; 1.43.12;
Make ip6 and icmp6 stats per-cpu.
 1.42 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.41 27-Feb-2008  matt branches: 1.41.2;
Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.40 26-Nov-2007  yamt branches: 1.40.10; 1.40.14;
in6_pcbsetport: add missing htons. (fixes ephemeral port allocation.)
 1.39 24-Oct-2007  dyoung branches: 1.39.2;
Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.38 23-May-2007  christos branches: 1.38.6; 1.38.8; 1.38.12;
Ansify + add a few comments, from Karl Sjödahl
 1.37 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.36 04-Mar-2007  christos branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.35 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.34 04-Jan-2007  elad branches: 1.34.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.33 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.32 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.31 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.30 16-Nov-2006  christos branches: 1.30.2; 1.30.4;
__unused removal on arguments; approved by core.
 1.29 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.28 01-Sep-2006  dyoung branches: 1.28.2; 1.28.4;
Restore historical kernel behavior: let an application bind(2) an
IPv6 interface address (e.g., sin6_addr fe80::200:24ff:fec3:4bac
sin6_scope_id 1), set a multicast interface with
setsockopt(,IPPROTO_IPV6,IPV6_MULTICAST_IF,), and sendto(2) multicast
destinations with "wildcard" scope ID, 0, without error EHOSTUNREACH.

Prior to this patch, sendto(2) would exit with EHOSTUNREACH, even
though the scope ID was unambiguously specified both by bind(2)
and setsockopt(2). This was a bug because it broke old applications.

Thanks JINMEI Tatuya for the patch!
 1.27 23-Jul-2006  ad branches: 1.27.2;
Use the LWP cached credentials where sane.
 1.26 14-May-2006  elad integrate kauth.
 1.25 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.24 15-Apr-2006  christos Coverity CID 607: Remove bogus test.
 1.23 21-Jan-2006  rpaulo branches: 1.23.2; 1.23.4; 1.23.6; 1.23.8; 1.23.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.22 11-Dec-2005  christos branches: 1.22.2;
merge ktrace-lwp.
 1.21 29-May-2005  christos branches: 1.21.2;
- avoid shadowed variables
- sprinkle const.
 1.20 01-Feb-2005  drochner branches: 1.20.4;
sin6_scope_id maps to interface indices for link local addresses only!
(unlikely to be used with other scopes for now, but we should be
correct anyway)
 1.19 04-Dec-2004  peter branches: 1.19.4; 1.19.6;
Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.18 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.17 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.16 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.15 11-Sep-2002  itojun branches: 1.15.6;
KNF - return is not a function. sync w/kame.
 1.14 26-Aug-2002  itojun pass proc * to in6_pcbsetport. PR 18073
 1.13 08-Jun-2002  itojun whitespace cleanup
 1.12 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.11 29-May-2002  itojun rm obsolete comment
 1.10 22-Jan-2002  itojun branches: 1.10.8; 1.10.10;
make sure to check address family on route cache. with IPv4 mapped
address we can see both AF_INET/INET6.
 1.9 13-Nov-2001  lukem add RCSIDs
 1.8 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.7 06-Jun-2001  mrg branches: 1.7.2;
fix a IPNOPRIVPORTS unused variable botch. noted by proff.
 1.6 30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.5 08-Feb-2001  itojun branches: 1.5.2;
amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync with kame
 1.4 26-Aug-2000  itojun branches: 1.4.2;
implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
 1.3 26-Aug-2000  itojun add missing IPNOPRIVPORTS case
 1.2 07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.1 03-Jun-2000  itojun branches: 1.1.2; 1.1.4;
sync with kame.
- use latest source address selection code - in6_src.c.
- correct frag header insertion.
- deep copy ip6 header portion in ip6_mloopback to avoid overwrite.
- do not bark when we forward packet to loopback.
- some cosmetics.
 1.1.4.2 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.1.4.1 03-Jun-2000  minoura file in6_src.c was added on branch minoura-xpg4dl on 2000-06-22 17:09:58 +0000
 1.1.2.2 27-Aug-2000  itojun pullup (approved by releng-1-5)

> implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.

> cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3
> cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8
> cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h
> cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c
> cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c
> cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c
> cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
 1.1.2.1 27-Aug-2000  itojun pullup 1.2 -> 1.3 (approved by releng-1-5)

> add missing IPNOPRIVPORTS case
 1.4.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.4.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.2.1 26-Aug-2000  bouyer file in6_src.c was added on branch thorpej_scsipi on 2000-11-20 18:10:50 +0000
 1.5.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.5.2.12 27-Aug-2002  nathanw Catch up to -current.
 1.5.2.11 15-Jul-2002  nathanw Whitespace.
 1.5.2.10 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.5.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.5.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.5.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.5.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.5.2.5 22-Oct-2001  nathanw Catch up to -current.
 1.5.2.4 21-Jun-2001  nathanw Catch up to -current.
 1.5.2.3 09-Apr-2001  nathanw Catch up with -current.
 1.5.2.2 13-Mar-2001  nathanw Be more careful not to dereference curproc when there might not be
a process context.
 1.5.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.7.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.7.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.7.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.7.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.7.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.10.1 27-Aug-2002  lukem Pull up revision 1.14 (requested by itojun in ticket #731):
pass proc * to in6_pcbsetport. PR 18073
 1.10.8.3 29-Aug-2002  gehenna catch up with -current.
 1.10.8.2 20-Jun-2002  gehenna catch up with -current.
 1.10.8.1 30-May-2002  gehenna Catch up with -current.
 1.15.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.15.6.5 04-Feb-2005  skrll Sync with HEAD.
 1.15.6.4 18-Dec-2004  skrll Sync with HEAD.
 1.15.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.15.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.15.6.1 03-Aug-2004  skrll Sync with HEAD
 1.19.6.1 12-Feb-2005  yamt sync with head.
 1.19.4.1 29-Apr-2005  kent sync with -current
 1.20.4.1 02-Dec-2007  bouyer Pull up following revision(s) (requested by yamt in ticket #1881):
sys/netinet6/in6_src.c: revision 1.40 via patch
in6_pcbsetport: add missing htons. (fixes ephemeral port allocation.)
 1.21.2.7 17-Mar-2008  yamt sync with head.
 1.21.2.6 07-Dec-2007  yamt sync with head
 1.21.2.5 27-Oct-2007  yamt sync with head.
 1.21.2.4 03-Sep-2007  yamt sync with head.
 1.21.2.3 26-Feb-2007  yamt sync with head.
 1.21.2.2 30-Dec-2006  yamt sync with head.
 1.21.2.1 21-Jun-2006  yamt sync with head.
 1.22.2.1 01-Feb-2006  yamt sync with head.
 1.23.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.23.8.5 11-May-2006  elad sync with head
 1.23.8.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.23.8.3 19-Apr-2006  elad sync with head.
 1.23.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.23.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.23.6.3 03-Sep-2006  yamt sync with head.
 1.23.6.2 11-Aug-2006  yamt sync with head
 1.23.6.1 24-May-2006  yamt sync with head.
 1.23.4.2 01-Jun-2006  kardel Sync with head.
 1.23.4.1 22-Apr-2006  simonb Sync with head.
 1.23.2.4 09-Sep-2006  rpaulo sync with head
 1.23.2.3 23-Feb-2006  rpaulo Last in6pcb in in6_selecthlim().
 1.23.2.2 14-Feb-2006  rpaulo Replace in6pcb with inpcb and IN6P_BOUND with INP_BOUND.
 1.23.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.27.2.1 03-Sep-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #106):
sys/netinet6/in6_src.c: revision 1.28
Restore historical kernel behavior: let an application bind(2) an
IPv6 interface address (e.g., sin6_addr fe80::200:24ff:fec3:4bac
sin6_scope_id 1), set a multicast interface with
setsockopt(,IPPROTO_IPV6,IPV6_MULTICAST_IF,), and sendto(2) multicast
destinations with "wildcard" scope ID, 0, without error EHOSTUNREACH.
Prior to this patch, sendto(2) would exit with EHOSTUNREACH, even
though the scope ID was unambiguously specified both by bind(2)
and setsockopt(2). This was a bug because it broke old applications.
Thanks JINMEI Tatuya for the patch!
 1.28.4.3 18-Dec-2006  yamt sync with head.
 1.28.4.2 10-Dec-2006  yamt sync with head.
 1.28.4.1 22-Oct-2006  yamt sync with head
 1.28.2.2 12-Jan-2007  ad Sync with head.
 1.28.2.1 18-Nov-2006  ad Sync with head.
 1.30.4.1 03-Jun-2008  skrll Sync with netbsd-4.
 1.30.2.1 01-Feb-2008  riz Pull up following revision(s) (requested by yamt in ticket #1006):
sys/netinet6/in6_src.c: revision 1.40
in6_pcbsetport: add missing htons. (fixes ephemeral port allocation.)
 1.34.2.3 07-May-2007  yamt sync with head.
 1.34.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.34.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.36.4.1 11-Jul-2007  mjf Sync with head.
 1.36.2.1 08-Jun-2007  ad Sync with head.
 1.38.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.38.8.3 23-Mar-2008  matt sync with HEAD
 1.38.8.2 09-Jan-2008  matt sync with HEAD
 1.38.8.1 06-Nov-2007  matt sync with HEAD
 1.38.6.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.38.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.39.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.40.14.3 17-Jan-2009  mjf Sync with HEAD.
 1.40.14.2 02-Jun-2008  mjf Sync with HEAD.
 1.40.14.1 03-Apr-2008  mjf Sync with HEAD.
 1.40.10.2 24-Mar-2008  keiichi sync with head.
 1.40.10.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.41.2.3 27-Dec-2008  christos merge with head.
 1.41.2.2 01-Nov-2008  christos Sync with head.
 1.41.2.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.43.12.2 28-Apr-2009  skrll Sync with HEAD.
 1.43.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.43.4.3 20-Jun-2009  yamt sync with head
 1.43.4.2 16-May-2009  yamt sync with head
 1.43.4.1 04-May-2009  yamt sync with head.
 1.45.2.2 23-Jul-2009  jym Sync with HEAD.
 1.45.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.49.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.49.4.1 31-May-2011  rmind sync with head
 1.52.2.1 30-Oct-2012  yamt sync with head
 1.53.12.1 10-Aug-2014  tls Rebase.
 1.53.4.2 18-May-2014  rmind sync with head
 1.53.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.53.2.2 03-Dec-2017  jdolecek update from HEAD
 1.53.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.54.2.1 23-Jan-2015  martin Pull up following revision(s) (requested by pettai in ticket #441):
sys/netinet6/ip6_var.h: revision 1.64
sys/netinet6/in6.h: revision 1.82
sys/netinet6/in6_src.c: revision 1.56
sys/netinet6/mld6.c: revision 1.62
sys/netinet6/ip6_input.c: revision 1.150
sys/netinet6/ip6_output.c: revision 1.161
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.
Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.55.2.10 28-Aug-2017  skrll Sync with HEAD
 1.55.2.9 05-Feb-2017  skrll Sync with HEAD
 1.55.2.8 05-Dec-2016  skrll Sync with HEAD
 1.55.2.7 05-Oct-2016  skrll Sync with HEAD
 1.55.2.6 09-Jul-2016  skrll Sync with HEAD
 1.55.2.5 29-May-2016  skrll Sync with HEAD
 1.55.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.55.2.3 22-Sep-2015  skrll Sync with HEAD
 1.55.2.2 06-Jun-2015  skrll Sync with HEAD
 1.55.2.1 06-Apr-2015  skrll Sync with HEAD
 1.63.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.63.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.63.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.63.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.63.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.76.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.79.6.4 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1883):

sys/netinet6/in6_src.c: revision 1.92

in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.
 1.79.6.3 11-Aug-2021  martin Pull up following revision(s) (requested by kardel in ticket #1690):

sys/netinet6/in6_src.c: revision 1.88

PR kern/56348

MTU discovery fails with IPv6 sockets bound to IPv4 mapped address
pick up the IPv4 route for IPv4 mapped IPv6 address to get the correct
MTU and not any unrelated/inappropriate MTU from IPv6 routes. IPv4 mapped
IPv6 addresses are always handled by the IPv4 stack and MTU discovery
is solely handled with the IPv4 routing table.
 1.79.6.2 10-Dec-2017  snj Pull up following revision(s) (requested by roy in ticket #390):
sys/netinet/ip_input.c: 1.363
sys/netinet6/ip6_input.c: 1.184-1.185
sys/netinet6/ip6_output.c: 1.194-1.195
sys/netinet6/in6_src.c: 1.83-1.84
Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
--
Attempt to restore v6 networking. Not 100% certain that these
changes are all that is needed, but they're certainly a big part of it
(especially the ip6_input.c change.)
--
Treat unvalidated addresses as deprecated in rule 3.
 1.79.6.1 31-Aug-2017  martin Pull up following revision(s) (requested by christos in ticket #243):
sys/netinet6/in6_src.c: revision 1.80
PR/52382: BERTRAND Joel: Fix mapped IPv4 source selection; this got broken
in the last code refactoring. in6_selectif failing is not fatal.
XXX: pullup-8
 1.84.2.1 02-May-2018  pgoyette Synch with HEAD
 1.85.6.2 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1706):

sys/netinet6/in6_src.c: revision 1.92

in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.
 1.85.6.1 11-Aug-2021  martin Pull up following revision(s) (requested by kardel in ticket #1332):

sys/netinet6/in6_src.c: revision 1.88

PR kern/56348

MTU discovery fails with IPv6 sockets bound to IPv4 mapped address
pick up the IPv4 route for IPv4 mapped IPv6 address to get the correct
MTU and not any unrelated/inappropriate MTU from IPv6 routes. IPv4 mapped
IPv6 addresses are always handled by the IPv4 stack and MTU discovery
is solely handled with the IPv4 routing table.
 1.85.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.91.2.1 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #309):

sys/netinet6/in6_src.c: revision 1.92

in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.
 1.4 02-Aug-1999  itojun remove sys/netinet6/in6_systm.h, as it is very empty.

crypto-us IPSEC build will be broken.
could someone please update?
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_systm.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_systm.h was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.108 27-Jun-2025  andvar Grammar and spelling fixes, mainly in comments. A few in documentation,
logging, test description, and SCSI ASC/ASCQ assignment descriptions.
 1.107 05-Jun-2025  ozaki-r in6: remove unused in6_get_ia_from_ifp()
 1.106 05-Jun-2025  ozaki-r in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
 1.105 05-Jun-2025  ozaki-r Apply if_first_addr() and if_first_addr_psref()
 1.104 16-Jun-2020  maxv branches: 1.104.20; 1.104.26;
remove unused
 1.103 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.102 18-Oct-2019  ozaki-r in6: reset the temporary address timer on a change of the interval period
 1.101 16-Oct-2019  ozaki-r Reorganize in6_tmpaddrtimer stuffs

- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule
 1.100 29-May-2018  ozaki-r branches: 1.100.2; 1.100.6;
Make a deletion of in6m in nd6_rtrequest atomic
 1.99 29-May-2018  ozaki-r Improve atomicity of in6_leavegroup and in6_delmulti
 1.98 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.97 02-Mar-2017  ozaki-r branches: 1.97.6; 1.97.12;
Plug a race condition on accessing i6mm_maddr
 1.96 02-Mar-2017  ozaki-r Fix racy in6m_sol

Relook up the entry instead of reusing it, which makes locking simple.
 1.95 02-Mar-2017  ozaki-r Protect ia6_memberships by in6_ifaddr_lock
 1.94 01-Mar-2017  ozaki-r Provide in6_multi_group

Use it when checking if we belong to the group, instead of in6_lookup_multi.

No functional change.
 1.93 23-Feb-2017  ozaki-r Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.92 22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.91 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.90 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.89 10-Jan-2017  ozaki-r branches: 1.89.2;
Enable some sysctl knobs on rump kernels for ifmcstat
 1.88 04-Jan-2017  christos - kill NULL argument from in6_update_ifa
- amend in6_update_ifa1 to return the ia, so that we can use it in pfil hooks
to avoid NULL pointer crash.
 1.87 14-Sep-2016  christos fix typo
 1.86 13-Sep-2016  christos remove trailing spaces. userland does not catch this?
 1.85 13-Sep-2016  christos add bits for address flags
 1.84 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.83 08-Jul-2016  ozaki-r branches: 1.83.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.82 08-Jul-2016  ozaki-r Kill remaining use of the old lists of IP addresses
 1.81 06-Jul-2016  ozaki-r Move in6_ifaddr_list to a more proper place (from ip6_input.c to in6.c)

It's a similar place as the IPv4 address list, i.e., in.c.

More varibles will join together.
 1.80 06-Jul-2016  ozaki-r Add missing IN6_ADDRLIST_ENTRY_DESTROY
 1.79 04-Jul-2016  ozaki-r Fix userland compilations of those including in6_var.h
 1.78 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.77 22-Jun-2016  ozaki-r Remove unnecessary NULL checks of ifa->ifa_addr

If it's NULL, it should be a bug. There many IFADDR_FOREACH that don't do
NULL check. If it can be NULL, they should fire already.
 1.76 04-Feb-2016  riastradh Declare in6_tmpaddrtimer_ch in in6_var.h.

Do not declare extern variables in .c files!
 1.75 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.74 06-Sep-2015  dholland More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.73 07-Apr-2015  roy Move in6if_do_dad() to if_do_dad() as the routine is not INET6 specific
and could equally be used by INET.
 1.72 26-Feb-2015  roy Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
 1.71 05-Sep-2014  matt branches: 1.71.2;
Don't nest structure definitions.
 1.70 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.69 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.68 13-Jan-2014  roy branches: 1.68.2;
Remove the now un-used function in6ifa_ifplocaladdr.
 1.67 02-Jan-2014  pooka Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.66 11-Oct-2012  christos branches: 1.66.2;
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.65 23-Jun-2012  christos branches: 1.65.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.64 15-Jan-2009  christos branches: 1.64.14; 1.64.20; 1.64.24;
mention that you'll need to update compat if you change the size of in6_ifreq.
 1.63 15-Jan-2009  christos Emulate a couple more ioctls. Thanks to Matthias Drochner for pointing them out.
 1.62 15-Jan-2009  christos - switch the lifetime struct to time_t and provide compatibility for the
old ioctl.
 1.61 14-Jan-2009  christos Change back time_t in the lifetime struct to int32_t's for binary compatibily.
Since this is just the number of seconds for lifetime of the address, it is
not an issue.
 1.60 20-Aug-2008  matt branches: 1.60.2;
Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.
 1.59 31-Jul-2008  matt Generalize previous fix so that both NS and NA packets are checked.
 1.58 15-Apr-2008  thorpej branches: 1.58.4; 1.58.6; 1.58.10;
Make ip6 and icmp6 stats per-cpu.
 1.57 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.56 05-Dec-2007  dyoung branches: 1.56.8; 1.56.12;
Use IFADDR_FOREACH().
 1.55 17-Oct-2007  jld branches: 1.55.4; 1.55.6;
If callout_t is to be used, then <sys/callout.h> should be included.
Fixes the build of ifconfig(8), which failed otherwise.
 1.54 16-Oct-2007  joerg Inline callout_t in struct in6_multi. This fixes a number of possible
memory leaks. Explicitly destroy the callout before freeing it.
Use callout_setfunc/callout_schedule instead of repeating it for
callout_reset.

Bump NetBSD version to 4.99.34 for kvm users.
 1.53 11-Sep-2007  gdt branches: 1.53.2;
Remove SIOCSIFALIFETIME_IN6, which could not possibly have ever worked.

Problem reported in kern/35897 by Robert Elz.
 1.52 19-Jul-2007  dyoung branches: 1.52.4; 1.52.6; 1.52.8;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.51 07-Mar-2007  liamjfoy branches: 1.51.2; 1.51.10;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.50 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.49 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.48 02-Dec-2006  dyoung branches: 1.48.2;
Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.47 20-Nov-2006  dyoung branches: 1.47.2; 1.47.8;
Use TAILQ_FOREACH().
 1.46 17-Oct-2006  christos use portable bitfields.
 1.45 23-Jul-2006  ad branches: 1.45.4; 1.45.6;
Use the LWP cached credentials where sane.
 1.44 18-May-2006  liamjfoy Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.43 05-Mar-2006  rpaulo branches: 1.43.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.42 03-Mar-2006  rpaulo branches: 1.42.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.41 21-Jan-2006  rpaulo branches: 1.41.2; 1.41.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.40 10-Dec-2005  elad branches: 1.40.2;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.39 29-May-2005  christos branches: 1.39.2;
- avoid shadowed variables
- sprinkle const.
 1.38 01-Feb-2005  drochner branches: 1.38.4; 1.38.6; 1.38.8;
remove the unused in6_ifindex2scopeid()
if at all, it works with site-local addresses whose fate is uncertain
to say the least
 1.37 16-Jun-2004  itojun branches: 1.37.4; 1.37.6;
insufficient paren in macro def. Patrick Latifi
 1.36 15-Oct-2003  itojun define struct prf_ra outside of in6_prflags, to be c++ friendly. sync w/kame
 1.35 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.34 01-Feb-2003  thorpej branches: 1.34.2;
Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.33 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.32 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.31 08-Jun-2002  itojun in6_len2mask is a duplicate of in6_prefixlen2mask. unify. sync w/kame
 1.30 07-Jun-2002  fvdl Fix mistakes in previous.
 1.29 07-Jun-2002  itojun style
 1.28 07-Jun-2002  itojun consistency
 1.27 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.26 29-May-2002  itojun move per-interface ip6/icmp6 stat to ifnet->if_afdata. sync w/kame
 1.25 23-May-2002  itojun simplify conditions to do DAD. sync w/kame
 1.24 21-Dec-2001  itojun branches: 1.24.8;
whitespace/costmetic sync w/kame
 1.23 20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.22 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.21 18-Oct-2001  itojun reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.20 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.19 18-Jul-2001  itojun do not malloc() during interrupt context for IPv6 multicast kludge table.
malloc() during interface initialization. sync with kame
 1.18 10-Feb-2001  itojun branches: 1.18.2; 1.18.4;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.17 08-Feb-2001  itojun amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync with kame
 1.16 16-Apr-2000  itojun perform neighbor unreachability detection on p2p links (spec requires
it for bidir p2p links).
improve -i in ndp(8) to allow tweaking per-interface ND flag on.
fix ndp(8) infinite loop on certain routing table setup.
 1.15 16-Apr-2000  itojun better sync with latest kame (cosmetic only).
 1.14 24-Mar-2000  itojun move ia6->ia6_dad_ch to dp->dad_timer_ch, to ease KAME code sharing.
now in6_var.h does not need to pull sys/callout.h in.
 1.13 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.12 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.11 25-Feb-2000  itojun on SIOCS*_IN6, validate sockaddrs so that we never configure non-AF_INET6
addresses. (in_control has the same problem - I'll need to check it as well)

obsolete the following two ioctls, they do not fit well against IPv6 addressing
model. (the kernel support them for some period of time, we'll remove them
in the near future)
SIOCSIFDSTADDR_IN6
SIOCSIFNETMASK_IN6
 1.10 04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.9 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.8 02-Feb-2000  itojun implement in6_purgemkludge(). in6_ifdetach() calls it to avoid dangling
kludge entries. the situation would occur if you take the following steps:
- join multicast groups (default ones like linklocal all-node is fine)
- remove all IPv6 addresses manually
- remove pcmcia card

to thorpej: pls call in6_ifdetach() when PRU_PURGEIF is raised (just before
removing ifnet). it should do the right thing (unable to perform real test
though)
 1.7 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.6 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.5 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.4 22-Jul-1999  itojun branches: 1.4.2; 1.4.8;
change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in6_var.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in6_var.h was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.18.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.18.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.4.1 03-Aug-2001  lukem update to -current
 1.18.2.5 11-Nov-2002  nathanw Catch up to -current
 1.18.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.18.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.18.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.18.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.24.8.2 20-Jun-2002  gehenna catch up with -current.
 1.24.8.1 30-May-2002  gehenna Catch up with -current.
 1.34.2.6 11-Dec-2005  christos Sync with head.
 1.34.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.34.2.4 04-Feb-2005  skrll Sync with HEAD.
 1.34.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.37.6.1 12-Feb-2005  yamt sync with head.
 1.37.4.1 29-Apr-2005  kent sync with -current
 1.38.8.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.38.6.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.38.4.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.39.2.6 07-Dec-2007  yamt sync with head
 1.39.2.5 27-Oct-2007  yamt sync with head.
 1.39.2.4 03-Sep-2007  yamt sync with head.
 1.39.2.3 26-Feb-2007  yamt sync with head.
 1.39.2.2 30-Dec-2006  yamt sync with head.
 1.39.2.1 21-Jun-2006  yamt sync with head.
 1.40.2.1 01-Feb-2006  yamt sync with head.
 1.41.4.2 01-Jun-2006  kardel Sync with head.
 1.41.4.1 22-Apr-2006  simonb Sync with head.
 1.41.2.2 09-Sep-2006  rpaulo sync with head
 1.41.2.1 07-Feb-2006  rpaulo in6pcb -> inpcb.
 1.42.2.3 11-Aug-2006  yamt sync with head
 1.42.2.2 24-May-2006  yamt sync with head.
 1.42.2.1 13-Mar-2006  yamt sync with head.
 1.43.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.45.6.2 10-Dec-2006  yamt sync with head.
 1.45.6.1 22-Oct-2006  yamt sync with head
 1.45.4.2 12-Jan-2007  ad Sync with head.
 1.45.4.1 18-Nov-2006  ad Sync with head.
 1.47.8.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1210).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.47.2.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1210).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.48.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.48.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.51.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.51.2.3 23-Oct-2007  ad Sync with head.
 1.51.2.2 09-Oct-2007  ad Sync with head.
 1.51.2.1 20-Aug-2007  ad Sync with HEAD.
 1.52.8.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.52.8.1 19-Jul-2007  dyoung file in6_var.h was added on branch matt-mips64 on 2007-07-19 20:48:57 +0000
 1.52.6.2 09-Jan-2008  matt sync with HEAD
 1.52.6.1 06-Nov-2007  matt sync with HEAD
 1.52.4.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.52.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.52.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.53.2.1 18-Oct-2007  yamt sync with head.
 1.55.6.1 08-Dec-2007  ad Sync with head.
 1.55.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.56.12.3 17-Jan-2009  mjf Sync with HEAD.
 1.56.12.2 28-Sep-2008  mjf Sync with HEAD.
 1.56.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.56.8.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.58.10.1 19-Oct-2008  haad Sync with HEAD.
 1.58.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.58.4.1 04-May-2009  yamt sync with head.
 1.60.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.64.24.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.64.20.2 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.64.20.1 31-Oct-2012  riz branches: 1.64.20.1.2;
Pull up following revision(s) (requested by christos in ticket #638):
sys/net/if_ppp.c: revision 1.137
sys/netinet6/ip6_flow.c: revision 1.20
sys/net/if_fddisubr.c: revision 1.82
sys/net/if_ethersubr.c: revision 1.192
sys/netinet6/in6_var.h: revision 1.66
sys/net/if_atmsubr.c: revision 1.50
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.64.20.1.2.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.64.14.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.64.14.1 30-Oct-2012  yamt sync with head
 1.65.2.3 03-Dec-2017  jdolecek update from HEAD
 1.65.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.65.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.66.2.1 18-May-2014  rmind sync with head
 1.68.2.1 10-Aug-2014  tls Rebase.
 1.71.2.9 28-Aug-2017  skrll Sync with HEAD
 1.71.2.8 05-Feb-2017  skrll Sync with HEAD
 1.71.2.7 05-Oct-2016  skrll Sync with HEAD
 1.71.2.6 09-Jul-2016  skrll Sync with HEAD
 1.71.2.5 19-Mar-2016  skrll Sync with HEAD
 1.71.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.71.2.3 22-Sep-2015  skrll Sync with HEAD
 1.71.2.2 06-Jun-2015  skrll Sync with HEAD
 1.71.2.1 06-Apr-2015  skrll Sync with HEAD
 1.83.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.83.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.83.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.83.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.89.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.97.12.2 25-Jun-2018  pgoyette Sync with HEAD
 1.97.12.1 22-Apr-2018  pgoyette Sync with HEAD
 1.97.6.1 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #842):

sys/netinet6/mld6.c: revision 1.93-1.99
sys/netinet6/in6_var.h: revision 1.99,1.100
sys/netinet6/in6.c: revision 1.267,1.268
sys/netinet6/nd6.c: revision 1.249

Don't hold softnet_lock in mld_timeo
Then we can get rid of remaining abuses of mutex_owned(softnet_lock).

Release in6_multilock on callout_halt of mld_timeo to avoid a deadlock
Improve atomicity of in6_leavegroup and in6_delmulti

Avoid NULL pointer dereference on imm->i6mm_maddr

Make a refcount decrement and a removal from a list of an item atomic
in6m_refcount of an in6m can be incremented if the in6m is on the list
(if_multiaddrs) in in6_addmulti or mld_input. So we must avoid such an
increment when we try to destroy an in6m. To this end we must make
an in6m_refcount decrement and a removal of an in6m from if_multiaddrs
atomic.

Make a deletion of in6m in nd6_rtrequest atomic

Move LIST_REMOVE
mld_stoptimer releases in6_multilock temporarily, so we must LIST_REMOVE first.

Avoid double LIST_REMOVE which corrupts lists
Mark in6m as used for non-DIAGNOSTIC builds.
 1.100.6.1 23-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #368):

sys/netinet6/in6_ifattach.h: revision 1.14
sys/netinet6/ip6_input.c: revision 1.212
sys/netinet6/ip6_input.c: revision 1.213
sys/netinet6/ip6_input.c: revision 1.214
sys/netinet6/in6_var.h: revision 1.101
sys/netinet6/in6_var.h: revision 1.102
sys/netinet6/in6_ifattach.c: revision 1.116
sys/netinet6/in6_ifattach.c: revision 1.117
tests/net/ndp/t_ra.sh: revision 1.33

Reorganize in6_tmpaddrtimer stuffs
- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule

Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change
ip6_temp_preferred_lifetime is used to calculate an interval period to
regenerate temporary addresse by
TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR
as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE +
DESYNC_FACTOR), otherwise it will be negative and go wrong, for example
KASSERT(to_ticks >= 0) in callout_schedule_locked fails.

tests: add tests for the validateion of net.inet6.ip6.temppltime

in6: reset the temporary address timer on a change of the interval period
 1.100.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.104.26.1 02-Aug-2025  perseant Sync with HEAD
 1.104.20.1 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.10 04-Jun-2000  itojun remove include files in nonstandard path
(has been #error for couple of months).
 1.9 09-Feb-2000  itojun branches: 1.9.2;
to improve RFC2553/2292 compliance, and promote use of
RFC2553/2292-compliant header file path, now the following headers are
forbidden:
netinet6/ip6.h
netinet6/icmp6.h
netinet6/in6.h

if you want netinet6/{ip6,icmp6}.h, use netinet/{ip6,icmp6}.h.

if you want netinet6/in6.h, you just need to include netinet/in.h.
it pulls it in.
(we may need to integrate them into netinet/in.h, but for cross-BSD code
sharing i'd like to keep it like this for now)
 1.8 06-Feb-2000  itojun to be more rfc2292 complient, move ip6.h and icmp6.h into netinet.
(netinet6/{ip6,icmp6}.h is non-standard path - these files should go away)

it was not possible to use cvsmove in this case.
when you try to look at history, chase it toward netinet6/{ip6,icmp6}.h.
 1.7 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 01-Oct-1999  itojun branches: 1.5.2; 1.5.8;
sanity check against truncated extension headers.
 1.4 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6.h was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6.h was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.1 20-Nov-2000  bouyer Remove files that are no longer on the trunck, and commit Makefile which
I forgot in the batch of commits.
 1.9.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.22 26-Jan-2018  maxv branches: 1.22.2; 1.22.4;
A few fixes:

* Style.

* Don't add M_PKTHDR manually, that's absolutely forbidden. Add a
KASSERT to make sure it's already there.

* Add a missing NULL check after m_pullup.
 1.21 11-Jan-2017  ozaki-r branches: 1.21.8;
Get rid of unnecessary header inclusions
 1.20 15-Dec-2016  ozaki-r Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input

The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
 1.19 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.18 10-Jun-2016  ozaki-r branches: 1.18.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.17 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.16 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.15 17-Jul-2011  joerg branches: 1.15.12; 1.15.30;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.14 24-Aug-2010  jakllsch Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.13 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.12 19-Jan-2010  pooka branches: 1.12.2; 1.12.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.11 19-Oct-2008  hans branches: 1.11.4;
if_input needs to be called at splnet(). ok by cube.
 1.10 16-Oct-2008  hans include bpf headers so that the bpf calls actually do something. ok by cube.
 1.9 15-Apr-2008  thorpej branches: 1.9.4; 1.9.10;
Make ip6 and icmp6 stats per-cpu.
 1.8 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.7 20-Dec-2007  dyoung branches: 1.7.6;
Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.6 11-Dec-2007  lukem use __KERNEL_RCSID()
 1.5 02-May-2007  dyoung branches: 1.5.8; 1.5.16; 1.5.18; 1.5.20;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.4 17-Feb-2007  dyoung branches: 1.4.4; 1.4.6;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.3 15-Dec-2006  joerg branches: 1.3.2; 1.3.4; 1.3.6;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.2 06-Dec-2006  jdc branches: 1.2.2;
Explicitly include <sys/device.h>, which we need for `struct device'.
This allows us to compile on !i386. (On i386, <machine/cpu.h> pulled
in <sys/device.h> for us, thus hiding the compilation problem.)

OK by rpaulo@.
 1.1 23-Nov-2006  rpaulo branches: 1.1.2;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.1.2.1 09-Dec-2006  bouyer Pull up following revision(s) (requested by jdc in ticket #259):
sys/netinet6/ip6_etherip.c: revision 1.2
sys/netinet/ip_etherip.c: revision 1.2
Explicitly include <sys/device.h>, which we need for `struct device'.
This allows us to compile on !i386. (On i386, <machine/cpu.h> pulled
in <sys/device.h> for us, thus hiding the compilation problem.)
OK by rpaulo@.
 1.2.2.3 18-Dec-2006  yamt sync with head.
 1.2.2.2 10-Dec-2006  yamt sync with head.
 1.2.2.1 06-Dec-2006  yamt file ip6_etherip.c was added on branch yamt-splraiseipl on 2006-12-10 07:19:15 +0000
 1.3.6.2 07-May-2007  yamt sync with head.
 1.3.6.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.3.4.2 12-Jan-2007  ad Sync with head.
 1.3.4.1 15-Dec-2006  ad file ip6_etherip.c was added on branch newlock2 on 2007-01-12 01:04:15 +0000
 1.3.2.5 21-Jan-2008  yamt sync with head
 1.3.2.4 03-Sep-2007  yamt sync with head.
 1.3.2.3 26-Feb-2007  yamt sync with head.
 1.3.2.2 30-Dec-2006  yamt sync with head.
 1.3.2.1 15-Dec-2006  yamt file ip6_etherip.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:38 +0000
 1.4.6.1 11-Jul-2007  mjf Sync with head.
 1.4.4.1 08-Jun-2007  ad Sync with head.
 1.5.20.2 02-Jan-2008  bouyer Sync with HEAD
 1.5.20.1 13-Dec-2007  bouyer Sync with HEAD
 1.5.18.1 11-Dec-2007  yamt sync with head.
 1.5.16.1 26-Dec-2007  ad Sync with head.
 1.5.8.1 09-Jan-2008  matt sync with HEAD
 1.7.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.7.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.9.10.1 19-Oct-2008  haad Sync with HEAD.
 1.9.4.4 09-Oct-2010  yamt sync with head
 1.9.4.3 11-Aug-2010  yamt sync with head.
 1.9.4.2 11-Mar-2010  yamt sync with head
 1.9.4.1 04-May-2009  yamt sync with head.
 1.11.4.1 21-Nov-2010  riz Pull up following revision(s) (requested by jakllsch in ticket #1445):
sys/netinet6/ip6_etherip.h: revision 1.2
sys/netinet6/in6_proto.c: revision 1.89
sys/netinet6/ip6_etherip.c: revision 1.14
Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.12.4.2 05-Mar-2011  rmind sync with head
 1.12.4.1 30-May-2010  rmind sync with head
 1.12.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.12.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.15.30.4 05-Feb-2017  skrll Sync with HEAD
 1.15.30.3 09-Jul-2016  skrll Sync with HEAD
 1.15.30.2 19-Mar-2016  skrll Sync with HEAD
 1.15.30.1 22-Sep-2015  skrll Sync with HEAD
 1.15.12.1 03-Dec-2017  jdolecek update from HEAD
 1.18.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.18.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.21.8.1 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #694):

sys/netinet6/ip6_etherip.c: revision 1.22
sys/net/if_etherip.c: revision 1.41
sys/net/if_etherip.c: revision 1.42
sys/netinet/ip_etherip.c: revision 1.21

Don't call if_attach, do if_initialize+if_register, otherwise when an
EtherIP packet is received the first KASSERT in if_input() fires.

A few fixes:
* Style.
* Don't add M_PKTHDR manually, that's absolutely forbidden. Add a
KASSERT to make sure it's already there.
* Add a missing NULL check after m_pullup.
 1.22.4.1 10-Jun-2019  christos Sync with HEAD
 1.22.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.3 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.2 24-Aug-2010  jakllsch branches: 1.2.58; 1.2.60;
Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.1 23-Nov-2006  rpaulo branches: 1.1.4; 1.1.6; 1.1.8; 1.1.58; 1.1.70; 1.1.80; 1.1.82;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.1.82.1 05-Mar-2011  rmind sync with head
 1.1.80.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.1.70.1 21-Nov-2010  riz Pull up following revision(s) (requested by jakllsch in ticket #1445):
sys/netinet6/ip6_etherip.h: revision 1.2
sys/netinet6/in6_proto.c: revision 1.89
sys/netinet6/ip6_etherip.c: revision 1.14
Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.1.58.1 09-Oct-2010  yamt sync with head
 1.1.8.2 12-Jan-2007  ad Sync with head.
 1.1.8.1 23-Nov-2006  ad file ip6_etherip.h was added on branch newlock2 on 2007-01-12 01:04:15 +0000
 1.1.6.2 30-Dec-2006  yamt sync with head.
 1.1.6.1 23-Nov-2006  yamt file ip6_etherip.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:38 +0000
 1.1.4.2 10-Dec-2006  yamt sync with head.
 1.1.4.1 23-Nov-2006  yamt file ip6_etherip.h was added on branch yamt-splraiseipl on 2006-12-10 07:19:15 +0000
 1.2.60.1 10-Jun-2019  christos Sync with HEAD
 1.2.58.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.43 29-Jun-2024  riastradh netinet6: Use _NET_STAT* API instead of direct array access.

XXX Exception: ip6flow_addstats_rt _assigns_ one of the `statistics'
to the current count of ip6 flows in use, and we don't have anything
in the _NET_STAT* API for that. So for now I abuse the abstraction,
until we sort out this one exceptional case properly.

PR kern/58380
 1.42 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.41 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.40 06-Feb-2018  ozaki-r branches: 1.40.16;
Shorten the name of a workqueue instance to fit to the limit (15)
 1.39 29-Jan-2018  maxv Style, and use __cacheline_aligned.

By the way, it would be nice to revisit the use of 'ip6flow_lock' in
ip6flow_fastforward(): it is taken right away because of 'ip6flow_inuse',
but then we perform several checks that do not require it.
 1.38 08-Jan-2018  knakahara Committed debugging logs by mistake, sorry. Revert cryoto.c:r.1.103 and ip6_flow.c:r.1.37.
 1.37 08-Jan-2018  knakahara Fix PR kern/52910. Reported and implemented a patch by Sevan Janiyan, thanks.
 1.36 10-Dec-2017  maxv Fix use-after-free: if m_pullup fails the (freed) mbuf is pushed on the
ip6_pktq queue and re-processed later. Return 1 to say "processed and
freed".
 1.35 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.34 11-Jan-2017  ozaki-r branches: 1.34.8;
Get rid of unnecessary header inclusions
 1.33 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.32 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.31 23-Aug-2016  knakahara improve fast-forward performance when the number of flows exceeds ip6_maxflows.

This is porting of ip_flow.c:r1.76

In ip6flow case, the before degradation is about 45%, the after degradation is
bout 55%.
 1.30 02-Aug-2016  knakahara ip6flow refactor like ipflow.

- move ip6flow sysctls into ip6_flow.c like ip_flow.c:r1.64
- build ip6_flow.c only if GATEWAY kernel option is enabled
 1.29 26-Jul-2016  ozaki-r Simplify by using atomic_swap instead of mutex

Suggested by kefren@
 1.28 11-Jul-2016  ozaki-r branches: 1.28.2;
Run timers in workqueue

Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).

Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.

Proposed on tech-net and tech-kern.
 1.27 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.26 13-Jun-2016  knakahara eliminate unnecessary splnet
 1.25 13-Jun-2016  knakahara MP-ify fastforward to support GATEWAY kernel option.

I add "ipflow_lock" mutex in ip_flow.c and "ip6flow_lock" mutex in ip6_flow.c
to protect all data in each file. Of course, this is not MP-scalable. However,
it is sufficient as tentative workaround. We should make it scalable somehow
in the future.

ok by ozaki-r@n.o.
 1.24 23-Mar-2015  roy Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.
 1.23 20-May-2014  bouyer branches: 1.23.2; 1.23.4;
Sync with the ipv4 code and call ifp->if_output() with KERNEL_LOCK
held.
Problem reported and fix tested by njoly@ on current-users@
 1.22 01-Apr-2014  pooka branches: 1.22.2;
Wrap ipflow_create() & ip6flow_create() in kernel lock. Prevents the
interrupt side on another core from seeing the situation while the ipflow
is being modified.
 1.21 23-May-2013  msaitoh branches: 1.21.2;
Clear mbuf's csum_flags in ip6flow_fastforward(). Fixes PR#47849.
 1.20 11-Oct-2012  christos PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.19 19-Jan-2012  liamjfoy branches: 1.19.2; 1.19.6; 1.19.8;
Remove ip6f_start from ip6f struct
 1.18 23-Mar-2009  liamjfoy branches: 1.18.12; 1.18.16;
Init ip6flow pool dynamically instead of using a linkset.
 1.17 28-Apr-2008  martin branches: 1.17.8; 1.17.10; 1.17.14;
Remove clause 3 and 4 from TNF licenses
 1.16 24-Apr-2008  ad branches: 1.16.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.15 15-Apr-2008  thorpej branches: 1.15.2;
Make ip6 and icmp6 stats per-cpu.
 1.14 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.13 04-Jan-2008  dyoung branches: 1.13.6;
Constify.
 1.12 04-Jan-2008  dyoung Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.
 1.11 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.10 11-Dec-2007  lukem use __KERNEL_RCSID()
 1.9 20-Aug-2007  dyoung branches: 1.9.2; 1.9.4; 1.9.10; 1.9.12; 1.9.14; 1.9.16;
Don't call rtcache_check() from the fast-forward code, which runs
at IPL_NET, because rtcache_check() may read the forwarding table.
Elsewhere, the kernel only blocks interrupts at priority IPL_SOFTNET
and below while it modifies the forwarding table, so rtcache_check()
could be reading the table in an inconsistent state. Use
rtcache_done(), instead.

XXX netinet/ip_flow.c and netinet6/ip6_flow.c are virtually identical.
XXX They should share code.
 1.8 02-May-2007  dyoung branches: 1.8.2; 1.8.6;
Remove obsolete files netinet/in_route.[ch].
 1.7 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.6 05-Apr-2007  liamjfoy use size_t for indexes

ok christos@
 1.5 23-Mar-2007  macallan caddr_t -> void *
 1.4 23-Mar-2007  liamjfoy Add a new sysctl net.inet6.ip6.hashsize to control the hash table size.

The sysctl handler will ensure this value is a power of 2

ok dyoung@
 1.3 12-Mar-2007  ad branches: 1.3.2; 1.3.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.2 08-Mar-2007  liamjfoy branches: 1.2.2; 1.2.4;
Use ip6flowtable when looking up
 1.1 07-Mar-2007  liamjfoy Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.2.4.5 07-May-2007  yamt sync with head.
 1.2.4.4 15-Apr-2007  yamt sync with head.
 1.2.4.3 24-Mar-2007  yamt sync with head.
 1.2.4.2 12-Mar-2007  rmind Sync with HEAD (missed new files in previous).
 1.2.4.1 08-Mar-2007  rmind file ip6_flow.c was added on branch yamt-idlelwp on 2007-03-12 06:14:56 +0000
 1.2.2.4 09-Oct-2007  ad Sync with head.
 1.2.2.3 08-Jun-2007  ad Sync with head.
 1.2.2.2 10-Apr-2007  ad Sync with head.
 1.2.2.1 13-Mar-2007  ad Sync with head.
 1.3.4.1 29-Mar-2007  reinoud Pullup to -current
 1.3.2.1 11-Jul-2007  mjf Sync with head.
 1.8.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.8.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.9.16.3 08-Jan-2008  bouyer Sync with HEAD
 1.9.16.2 02-Jan-2008  bouyer Sync with HEAD
 1.9.16.1 13-Dec-2007  bouyer Sync with HEAD
 1.9.14.1 11-Dec-2007  yamt sync with head.
 1.9.12.1 26-Dec-2007  ad Sync with head.
 1.9.10.1 18-Feb-2008  mjf Sync with HEAD.
 1.9.4.3 21-Jan-2008  yamt sync with head
 1.9.4.2 03-Sep-2007  yamt sync with head.
 1.9.4.1 20-Aug-2007  yamt file ip6_flow.c was added on branch yamt-lazymbuf on 2007-09-03 14:43:32 +0000
 1.9.2.1 09-Jan-2008  matt sync with HEAD
 1.13.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.15.2.1 18-May-2008  yamt sync with head.
 1.16.2.2 04-May-2009  yamt sync with head.
 1.16.2.1 16-May-2008  yamt sync with head.
 1.17.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.17.10.1 19-Jun-2013  bouyer Pull up following revision(s) (requested by msaitoh in ticket #1864):
sys/netinet6/ip6_flow.c: revision 1.21
Clear mbuf's csum_flags in ip6flow_fastforward(). Fixes PR#47849.
 1.17.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.18.16.1 18-Feb-2012  mrg merge to -current.
 1.18.12.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.18.12.2 30-Oct-2012  yamt sync with head
 1.18.12.1 17-Apr-2012  yamt sync with head
 1.19.8.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.19.6.4 03-Dec-2017  jdolecek update from HEAD
 1.19.6.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.6.2 23-Jun-2013  tls resync from head
 1.19.6.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.19.2.3 03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.19.2.2 19-Jun-2013  bouyer Pull up following revision(s) (requested by msaitoh in ticket #895):
sys/netinet6/ip6_flow.c: revision 1.21
Clear mbuf's csum_flags in ip6flow_fastforward(). Fixes PR#47849.
 1.19.2.1 31-Oct-2012  riz branches: 1.19.2.1.2;
Pull up following revision(s) (requested by christos in ticket #638):
sys/net/if_ppp.c: revision 1.137
sys/netinet6/ip6_flow.c: revision 1.20
sys/net/if_fddisubr.c: revision 1.82
sys/net/if_ethersubr.c: revision 1.192
sys/netinet6/in6_var.h: revision 1.66
sys/net/if_atmsubr.c: revision 1.50
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.19.2.1.2.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.21.2.1 18-May-2014  rmind sync with head
 1.22.2.1 10-Aug-2014  tls Rebase.
 1.23.4.5 05-Feb-2017  skrll Sync with HEAD
 1.23.4.4 05-Dec-2016  skrll Sync with HEAD
 1.23.4.3 05-Oct-2016  skrll Sync with HEAD
 1.23.4.2 09-Jul-2016  skrll Sync with HEAD
 1.23.4.1 06-Apr-2015  skrll Sync with HEAD
 1.23.2.1 12-May-2017  snj Pull up following revision(s) (requested by skrll/ozaki-r in ticket #1402):
sys/net/route.c: revision 1.170 via patch
sys/netinet/ip_flow.c: revision 1.73 via patch
sys/netinet6/ip6_flow.c: revision 1.28 via patch
sys/netinet6/nd6.c: revision 1.203 via patch
Run timers in workqueue
Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).
Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.
Proposed on tech-net and tech-kern.
 1.28.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.28.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.28.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.28.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.34.8.2 09-Jan-2018  snj Pull up following revision(s) (requested by maxv in ticket #481):
sys/netinet6/ip6_flow.c: revision 1.36
Fix use-after-free: if m_pullup fails the (freed) mbuf is pushed on the
ip6_pktq queue and re-processed later. Return 1 to say "processed and
freed".
 1.34.8.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.40.16.1 03-Apr-2021  thorpej Sync with HEAD.
 1.103 29-Jun-2024  riastradh netinet6: Use _NET_STAT* API instead of direct array access.

XXX Exception: ip6flow_addstats_rt _assigns_ one of the `statistics'
to the current count of ip6 flows in use, and we don't have anything
in the _NET_STAT* API for that. So for now I abuse the abstraction,
until we sort out this one exceptional case properly.

PR kern/58380
 1.102 28-Aug-2020  ozaki-r inet6: reduce silent packet discards
 1.101 28-Aug-2020  ozaki-r inet6: pass rcvif to ip6_forward to avoid extra psref_acquire
 1.100 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.99 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.98 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.97 19-Sep-2019  ozaki-r Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.96 13-May-2019  ozaki-r branches: 1.96.2;
Count packets dropped by pfil
 1.95 01-May-2018  maxv branches: 1.95.2;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.94 26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.93 18-Apr-2018  maxv Remove unused netipsec/xform.h includes.
 1.92 29-Jan-2018  maxv branches: 1.92.2;
style
 1.91 29-Jan-2018  maxv Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.90 09-Jan-2018  ozaki-r Fix use-after-free of mbuf by ip6flow_create (one more)

XXX need pullup-[678]
 1.89 09-Jan-2018  ozaki-r Fix use-after-free of mbuf by ip6flow_create

This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.

XXX need pullup-[678]
 1.88 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.87 09-May-2017  ozaki-r branches: 1.87.2;
Add missing KEY_FREESP to ip6_forward
 1.86 14-Feb-2017  ozaki-r branches: 1.86.4;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.85 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.84 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.83 11-Jan-2017  ozaki-r branches: 1.83.2;
Get rid of unnecessary header inclusions
 1.82 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.81 31-Aug-2016  ozaki-r Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@
 1.80 28-Jun-2016  ozaki-r branches: 1.80.2;
Add missing NULL checks for m_get_rcvif_psref
 1.79 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.78 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.77 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.76 10-Dec-2014  christos call vsnprintf instead of snprintf; provide more detail
 1.75 08-Dec-2014  christos Merge some common code in the failed forwarding case, while providing better
diagnostics, and fixing leaks.
 1.74 14-Nov-2014  maxv branches: 1.74.2;
Do not uselessly include <sys/malloc.h>.
 1.73 30-May-2014  christos branches: 1.73.2;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.72 29-Jun-2013  rmind branches: 1.72.4;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.71 05-Jun-2013  christos branches: 1.71.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.70 22-Mar-2012  drochner branches: 1.70.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.69 19-Dec-2011  drochner branches: 1.69.2; 1.69.6; 1.69.8;
rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.68 04-Feb-2010  joerg branches: 1.68.12; 1.68.16;
Explicitly include opt_gateway.h when depending on GATEWAY.
 1.67 11-Nov-2009  joerg Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output. Patch suggested by bad@
 1.66 18-Mar-2009  cegger bzero -> memset
 1.65 23-Apr-2008  thorpej branches: 1.65.2; 1.65.10; 1.65.12; 1.65.16; 1.65.18; 1.65.20;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.64 15-Apr-2008  thorpej branches: 1.64.2;
Make ip6 and icmp6 stats per-cpu.
 1.63 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.62 14-Jan-2008  dyoung branches: 1.62.2; 1.62.6;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in6_losing().
 1.61 12-Jan-2008  dyoung Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().
 1.60 10-Jan-2008  dyoung Save some rtcache_getrt() calls.
 1.59 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.58 23-May-2007  christos branches: 1.58.8; 1.58.14; 1.58.16; 1.58.20;
Ansify + add a few comments, from Karl Sjödahl
 1.57 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.56 07-Mar-2007  liamjfoy branches: 1.56.2; 1.56.4;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.55 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.54 10-Feb-2007  degroote branches: 1.54.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.53 26-Jan-2007  dyoung bzero -> memset
 1.52 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.51 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.50 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.49 29-Jun-2006  liamjfoy branches: 1.49.4; 1.49.6; 1.49.8; 1.49.10;
Fix a minor printf found while reading the code
 1.48 07-Jun-2006  kardel branches: 1.48.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.47 21-Jan-2006  rpaulo branches: 1.47.2; 1.47.4; 1.47.6; 1.47.12;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.46 11-Dec-2005  christos branches: 1.46.2;
merge ktrace-lwp.
 1.45 29-May-2005  christos branches: 1.45.2;
- avoid shadowed variables
- sprinkle const.
 1.44 26-Feb-2005  perry nuke trailing whitespace
 1.43 16-Jul-2004  itojun branches: 1.43.4; 1.43.6;
prevent mbuf leak on IPsec tunnel mode. from iij seil team
 1.42 24-Jun-2004  itojun error could be left uninitialized when we jump into "senderr"
 1.41 16-Jan-2004  itojun when ipsec tunnel mode is applied, we are originating packet (instead of
forwarding). go to ip6_output() path for fragmentation and other processing.
from kame
 1.40 29-Oct-2003  mycroft Do a jump optimization that eliminates some uninitialized variable warnings.
 1.39 03-Oct-2003  itojun shouldn't check scope match when encapsulating packet into tunnel mode.
iij seil team
 1.38 02-Oct-2003  itojun do not deref state.ro if it is NULL
 1.37 02-Oct-2003  itojun correctly look at outer IPv6 header when forwarding packet into ipsec tunnel.
iij seil team
 1.36 07-Aug-2003  itojun make net.inet6.ip6.redirect actually work. from Tomoyuki Sahara via kame
 1.35 03-Jul-2003  itojun minor KNF
 1.34 30-Jun-2003  itojun branches: 1.34.2;
KNF
 1.33 24-Jun-2003  itojun use time.tv_sec directly
 1.32 11-Sep-2002  itojun avoid from applying IPsec transport mode to the packets when the kernel
forwards the packets.
sync w/kame
 1.31 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.30 07-Jun-2002  itojun typo
 1.29 07-Jun-2002  itojun 'fall through' is not a valid LINT keyword.
 1.28 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.27 18-Dec-2001  itojun branches: 1.27.8; 1.27.10;
reduce white space/cosmetic diffs w/kame.
 1.26 13-Nov-2001  lukem add RCSIDs
 1.25 24-Oct-2001  itojun more whitespace sync with kame
 1.24 17-Oct-2001  itojun branches: 1.24.2;
unifdef OLDIP6OUTPUT
 1.23 18-Jul-2001  itojun sync with draft-ietf-ipngwg-p2p-pingpong-00.txt. apply special behavior
only if ip6_dst is "neighbor" within p2p prefix. sync with kame
 1.22 22-Jun-2001  itojun branches: 1.22.2;
do not forward packet back to point-to-point interface, if the packet
matches the ipv6 prefix assigned to the p2p interface (= redirect case).
this leads to pingpong, chews bandwidth. bad thing is that bad guy from
remote can chew bandwidth. (follows upcoming internet draft)
 1.21 12-Jun-2001  matt senderr needs only be declared when PFIL_HOOKS is defined
 1.20 12-Jun-2001  itojun run pfil_hooks for IPv6 forwarding path (note: ip6_forward() does not
call ip6_output()).
 1.19 30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.18 10-Feb-2001  itojun branches: 1.18.2;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.17 22-Sep-2000  itojun on ipsec policy lookup, do not try to lookup port numbers for forwarded packet.
sync with kame.
 1.16 27-Jul-2000  itojun do not forward packet with :: in the source.
this is not in the spec - we had rough consensus on it in ipngwg,
spec will get updated to include this behavior.
 1.15 16-Jul-2000  itojun s/IPSEC_IPV6FWD/IPSEC/. this should correct strange behavior on ipv6
forwarding (even if policy asks for tunnel mode encryption, packets
go out in clear). sync with kame.
 1.14 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.13 30-Jun-2000  itojun suppress too noisy warning on forward-over-loopback case. from kame
 1.12 03-Jun-2000  itojun branches: 1.12.2;
sync with kame.
- use latest source address selection code - in6_src.c.
- correct frag header insertion.
- deep copy ip6 header portion in ip6_mloopback to avoid overwrite.
- do not bark when we forward packet to loopback.
- some cosmetics.
 1.11 19-May-2000  itojun branches: 1.11.2;
correct manipulation of link-local scoped address on loopback.
now "telnet fe80::1%lo0" should work again.
(we have another bug near here - will attack it soon)
 1.10 19-May-2000  itojun do not mistakingly forward link-local scoped packet (the bug was added
with "beyondscope" icmp6 support).
"options FAKE_LOOPBACK_IF" will honor scope on loopback outputs. rcvif will
be real interface, not the loopback, just like when multicast loopback.

(sync with kame)
 1.9 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.8 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.7 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.6 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.5 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.4 30-Jul-1999  itojun branches: 1.4.2; 1.4.8;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6_forward.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6_forward.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.3 21-Apr-2001  bouyer Sync with HEAD
 1.4.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.12.2.13 07-Apr-2004  jmc Pullup rev 1.39 (requested by itojun in ticket #99)

Shouldn't check scope match when encapsulating packet into tunnel mode.
 1.12.2.12 07-Apr-2004  jmc Pullup rev 1.38 (requested by itojun in ticket #96)

Do not deref state.ro if it is NULL.
 1.12.2.11 07-Apr-2004  jmc Pullup patch (requested by itojun in ticket #94)

Correctly look at outer IPv6 header when forwarding packet into ipsec tunnel.
 1.12.2.10 11-Feb-2003  msaitoh Pull up revision 1.22 (requested by itojun):
do not forward packet back to point-to-point interface, if the packet
matches the ipv6 prefix assigned to the p2p interface (= redirect case).
this leads to pingpong, chews bandwidth. bad thing is that bad guy from
remote can chew bandwidth. (follows upcoming internet draft)
 1.12.2.9 14-Nov-2002  itojun sys/netinet6/ip6_forward.c 1.20 via patch

Need opt_pfil_hooks.h for PFIL_HOOK.

(masanobu)
 1.12.2.8 26-Feb-2002  he Apply patch (requested by martti):
Fix it so that IPFilter handles IPv6 traffic.
 1.12.2.7 09-Feb-2002  he Pull up revision 1.12.2.5 (requested by martti):
Updated IPFilter to 3.4.23.
(This one re-adds filtering support for forwarded IPv6 packets.)
 1.12.2.6 25-Oct-2001  jhawk Revert Darren Reed <darrenr@netbsd.org>'s unauthorized commit to the
netbsd-1-5 branch, rev 1.12.2.5 of 2001/10/15 13:19:15. This may
re-appear on the branch pending suitable review.
 1.12.2.5 15-Oct-2001  darrenr add ipv6 filtering hooks with pfil_hook for forwarded packet case
 1.12.2.4 29-Sep-2000  itojun pullup (approved by releng-1-5)

do not try to look up port number on forwarding case.
sys/netinet6/ip6_forward.c 1.16 -> 1.17
 1.12.2.3 28-Jul-2000  itojun pullup 1.15 -> 1.16 (aproved by releng-1-5)

> do not forward packet with :: in the source.
> this is not in the spec - we had rough consensus on it in ipngwg,
> spec will get updated to include this behavior.
 1.12.2.2 17-Jul-2000  itojun pullup 1.13 -> 1.15 (approved by releng-1-5)

1.14 -> 1.15
s/IPSEC_IPV6FWD/IPSEC/. this should correct strange behavior on ipv6
forwarding (even if policy asks for tunnel mode encryption, packets
go out in clear). sync with kame.

1.13 -> 1.14
date: 2000/07/06 12:51:41; author: itojun; state: Exp; lines: +2 -3
remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.12.2.1 01-Jul-2000  itojun mrege 1.12 -> 1.13: (approved by: releng-1-5)
suppress too noisy warning on forward-over-loopback case. from kame
 1.18.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.18.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.18.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.18.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.18.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.18.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.18.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.18.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.22.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.22.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.22.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.22.2.1 03-Aug-2001  lukem update to -current
 1.24.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.27.10.3 04-Oct-2003  tron Pull up revision 1.39 (requested by itojun in ticket #1504):
shouldn't check scope match when encapsulating packet into tunnel mode.
iij seil team
 1.27.10.2 02-Oct-2003  tron Pull up revision 1.38 (requested by itojun in ticket #1501):
do not deref state.ro if it is NULL
 1.27.10.1 02-Oct-2003  tron Pull up revision 1.37 via patch (requested by itojun in ticket #1500):
correctly look at outer IPv6 header when forwarding packet into ipsec tunnel.
iij seil team
 1.27.8.2 20-Jun-2002  gehenna catch up with -current.
 1.27.8.1 30-May-2002  gehenna Catch up with -current.
 1.34.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.34.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.34.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.43.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.43.4.1 29-Apr-2005  kent sync with -current
 1.45.2.5 21-Jan-2008  yamt sync with head
 1.45.2.4 03-Sep-2007  yamt sync with head.
 1.45.2.3 26-Feb-2007  yamt sync with head.
 1.45.2.2 30-Dec-2006  yamt sync with head.
 1.45.2.1 21-Jun-2006  yamt sync with head.
 1.46.2.1 01-Feb-2006  yamt sync with head.
 1.47.12.1 19-Jun-2006  chap Sync with head.
 1.47.6.2 11-Aug-2006  yamt sync with head
 1.47.6.1 26-Jun-2006  yamt sync with head.
 1.47.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.47.2.1 09-Sep-2006  rpaulo sync with head
 1.48.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.49.10.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.49.8.2 14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1366):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.49.8.1 24-May-2007  pavel branches: 1.49.8.1.4;
Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.49.8.1.4.1 14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1366):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.49.6.2 18-Dec-2006  yamt sync with head.
 1.49.6.1 10-Dec-2006  yamt sync with head.
 1.49.4.2 01-Feb-2007  ad Sync with head.
 1.49.4.1 12-Jan-2007  ad Sync with head.
 1.54.2.3 07-May-2007  yamt sync with head.
 1.54.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.54.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.56.4.1 11-Jul-2007  mjf Sync with head.
 1.56.2.1 08-Jun-2007  ad Sync with head.
 1.58.20.3 19-Jan-2008  bouyer Sync with HEAD
 1.58.20.2 10-Jan-2008  bouyer Sync with HEAD
 1.58.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.58.16.1 26-Dec-2007  ad Sync with head.
 1.58.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.58.8.2 23-Mar-2008  matt sync with HEAD
 1.58.8.1 09-Jan-2008  matt sync with HEAD
 1.62.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.62.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.64.2.1 18-May-2008  yamt sync with head.
 1.65.20.1 21-Apr-2010  matt sync to netbsd-5
 1.65.18.1 14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1139):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.65.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.65.12.1 14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1139):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.65.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.65.2.2 11-Mar-2010  yamt sync with head
 1.65.2.1 04-May-2009  yamt sync with head.
 1.68.16.2 05-Apr-2012  mrg sync to latest -current.
 1.68.16.1 18-Feb-2012  mrg merge to -current.
 1.68.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.68.12.1 17-Apr-2012  yamt sync with head
 1.69.8.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1540):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.69.8.1 13-Mar-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1518):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.69.6.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1540):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.69.6.1 13-Mar-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1518):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.69.2.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1540):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.69.2.1 13-Mar-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1518):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.70.2.3 03-Dec-2017  jdolecek update from HEAD
 1.70.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.2.1 23-Jun-2013  tls resync from head
 1.71.2.1 28-Aug-2013  rmind sync with head
 1.72.4.1 10-Aug-2014  tls Rebase.
 1.73.2.3 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1590):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.73.2.2 12-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1551):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.73.2.1 17-Jan-2015  martin branches: 1.73.2.1.2; 1.73.2.1.6;
Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.73.2.1.6.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1590):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.73.2.1.6.1 12-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1551):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.73.2.1.2.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1590):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.73.2.1.2.1 12-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1551):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.74.2.6 28-Aug-2017  skrll Sync with HEAD
 1.74.2.5 05-Feb-2017  skrll Sync with HEAD
 1.74.2.4 05-Oct-2016  skrll Sync with HEAD
 1.74.2.3 09-Jul-2016  skrll Sync with HEAD
 1.74.2.2 22-Sep-2015  skrll Sync with HEAD
 1.74.2.1 06-Apr-2015  skrll Sync with HEAD
 1.80.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.80.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.83.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.86.4.1 11-May-2017  pgoyette Sync with HEAD
 1.87.2.4 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.87.2.3 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #671):

sys/netinet6/ip6_forward.c: revision 1.91

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.87.2.2 09-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #484):
sys/netinet6/ip6_forward.c: 1.89-1.90
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.87.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.92.2.2 02-May-2018  pgoyette Synch with HEAD
 1.92.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.95.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.95.2.1 10-Jun-2019  christos Sync with HEAD
 1.96.2.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.22 08-Mar-2021  christos no need for ip6_id.c...
 1.21 07-Mar-2021  christos Amend missed messages:

netinet6: Pick IPv6 fragment ids uniformly at random.

Expected number of packets before collision is ~2^16, about the same
as we get for IPv4 with alternating disjoint random cycles. Keep it
simple unless we determine we really need something much better for
IPv6 than what IPv4 can achieve anyway.

netinet6: Rip out now-unused IPv6 fragment id logic.

(from riastradh)
 1.20 07-Mar-2021  christos netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.

(from riastradh)
 1.19 18-Oct-2019  msaitoh branches: 1.19.8;
s/initalize/initialize/ in comment or printf message.
 1.18 07-Aug-2015  ozaki-r branches: 1.18.10; 1.18.18; 1.18.22;
Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.17 19-Nov-2011  tls branches: 1.17.8; 1.17.26;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.16 30-Aug-2006  christos branches: 1.16.92;
Fix initializers.
 1.15 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.14 11-Dec-2005  christos branches: 1.14.4; 1.14.6; 1.14.8; 1.14.14;
merge ktrace-lwp.
 1.13 23-Mar-2004  itojun branches: 1.13.4; 1.13.18;
typo
 1.12 26-Dec-2003  wiz Niels Provos kindly agreed to drop clauses 3 and 4 from the
license -- thanks.
Based on OpenBSD commit and hints by itojun.
 1.11 10-Dec-2003  itojun comment from niels provos;
- seed2 is necessary, but use it as "seed2 + x" not "seed2 ^ x".
- skipping number is not needed, so disable it for 16bit generator (makes
the repetition period to 30000)
 1.10 25-Nov-2003  itojun "seed2" was ruining non-repeating property, so remove it. discussed on tech-net
 1.9 16-Sep-2003  itojun exp is a reserved name under posix
 1.8 15-Sep-2003  itojun avoid overflow during multiply. David Laight
 1.7 13-Sep-2003  itojun correct ru_a/ru_b setup for 20bit case
 1.6 09-Sep-2003  itojun lint
 1.5 06-Sep-2003  itojun correct seed generation. sync w/ kame
 1.4 06-Sep-2003  itojun fix comment, from kame
 1.3 06-Sep-2003  itojun correct comment
 1.2 06-Sep-2003  itojun fix msb handling. from kame
 1.1 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.13.18.2 30-Dec-2006  yamt sync with head.
 1.13.18.1 21-Jun-2006  yamt sync with head.
 1.13.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.13.4.2 03-Aug-2004  skrll Sync with HEAD
 1.13.4.1 23-Mar-2004  skrll file ip6_id.c was added on branch ktrace-lwp on 2004-08-03 10:55:13 +0000
 1.14.14.1 19-Jun-2006  chap Sync with head.
 1.14.8.2 03-Sep-2006  yamt sync with head.
 1.14.8.1 26-Jun-2006  yamt sync with head.
 1.14.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.14.4.1 09-Sep-2006  rpaulo sync with head
 1.16.92.1 17-Apr-2012  yamt sync with head
 1.17.26.1 22-Sep-2015  skrll Sync with HEAD
 1.17.8.1 03-Dec-2017  jdolecek update from HEAD
 1.18.22.1 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1226):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.18.18.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.10.1 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1661):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.19.8.1 03-Apr-2021  thorpej Sync with HEAD.
 1.228 29-Jun-2024  riastradh netinet6: Use _NET_STAT* API instead of direct array access.

XXX Exception: ip6flow_addstats_rt _assigns_ one of the `statistics'
to the current count of ip6 flows in use, and we don't have anything
in the _NET_STAT* API for that. So for now I abuse the abstraction,
until we sort out this one exceptional case properly.

PR kern/58380
 1.227 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.226 24-Oct-2022  knakahara Fix PR kern/57037

Be able to change the behavior sending parameter changing routing messages.
When set net.inet6.ip6.param_rt_msg=0, don't send parameter changing
routing messages.
When set net.inet6.ip6.param_rt_msg=1(default), send parameter changing
routing messages by RTM_NEWADDR.
 1.225 02-Sep-2022  thorpej pktqueue: Re-factor sysctl handling.

Provide a new pktq_sysctl_setup() function that attaches standard
pktq sysctl nodes below a specified parent node, with either a
fixed node ID or CTL_CREATE to dynamically assign node IDs. Make
all of the sysctl handlers private to pktqueue.c, and remove the
INET- and INET6-specific pktqueue sysctl code from net/if.c.
 1.224 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.223 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.222 28-Aug-2020  ozaki-r branches: 1.222.2;
inet6: reduce silent packet discards
 1.221 28-Aug-2020  ozaki-r inet6: pass rcvif to ip6_forward to avoid extra psref_acquire
 1.220 28-Aug-2020  ozaki-r ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.
 1.219 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.218 27-Jul-2020  roy ip6: Remove __packed attribute from ip6 structures

They should naturally align.
Add compile time assertations to ip6_input.c to prove this.
 1.217 19-Jun-2020  maxv localify
 1.216 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.215 12-Nov-2019  maxv Add more checks in ip6_pullexthdr, to prevent a panic in m_copydata. The
Rip6 entry point could see a garbage Hop6 option.

Not a big issue, since it's a clean panic only triggerable if the socket
has the IN6P_DSTOPTS/IN6P_RTHDR option.

Reported-by: syzbot+3b07b3511b4ceb8bf1e2@syzkaller.appspotmail.com
 1.214 18-Oct-2019  ozaki-r in6: reset the temporary address timer on a change of the interval period
 1.213 16-Oct-2019  ozaki-r Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change

ip6_temp_preferred_lifetime is used to calculate an interval period to
regenerate temporary addresse by
TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR
as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE +
DESYNC_FACTOR), otherwise it will be negative and go wrong, for example
KASSERT(to_ticks >= 0) in callout_schedule_locked fails.
 1.212 16-Oct-2019  ozaki-r Reorganize in6_tmpaddrtimer stuffs

- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule
 1.211 19-Sep-2019  ozaki-r Apply some missing changes lost on the previous commit
 1.210 19-Sep-2019  ozaki-r Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.209 15-Sep-2019  bouyer Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.208 13-May-2019  ozaki-r branches: 1.208.2;
Count packets dropped by pfil
 1.207 17-Jan-2019  knakahara Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.
 1.206 14-Jan-2019  maxv Fix bug, should be ip6_protox[].
 1.205 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.204 19-May-2018  maxv branches: 1.204.2;
Remove misleading comment.
 1.203 17-May-2018  maxv Add KASSERTs, related to PR/39794.
 1.202 14-May-2018  maxv Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
 1.201 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.200 26-Apr-2018  maxv Remove unused mbuf argument from sbsavetimestamp.
 1.199 26-Apr-2018  maxv Move the address checks into one function, ip6_badaddr(). In this function,
reinstate the "IPv4-compatible IPv6 addresses" check; these addresses are
deprecated by RFC4291 (2006).
 1.198 15-Apr-2018  maxv Remove useless DIAGNOSTIC block, the caller already ensures the
assumptions, and here we're not doing anything (it should be a panic
rather than a printf).
 1.197 15-Apr-2018  maxv Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.
 1.196 11-Apr-2018  maxv Add comment about IPsec.
 1.195 21-Mar-2018  roy Sprinkle more soroverflow().
 1.194 06-Mar-2018  maxv Perform the IP (src/dst) checks _before_ calling the packet filter, because
if the filter has a "return-icmp" rule it may call icmp6_error with an src
field that was not entirely validated.
 1.193 24-Feb-2018  ozaki-r branches: 1.193.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.192 14-Feb-2018  maxv Re-make ip6_nexthdr global, it will be used in soon-to-be-added code...
 1.191 12-Feb-2018  maxv Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.
 1.190 09-Feb-2018  maxv Remove dead code.
 1.189 30-Jan-2018  maxv Style, localify, remove dead code, and fix typos. No functional change.
 1.188 30-Jan-2018  maxv Kick nested fragments.
 1.187 30-Jan-2018  maxv Fix a buffer overflow in ip6_get_prevhdr. Doing

mtod(m, char *) + len

is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.

The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.

But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.

However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.

As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.

Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.

Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.

This place is still fragile.
 1.186 29-Jan-2018  maxv Start cleaning up ip6_input.c. Several pieces of code have evolved but
their neighboring comments were not updated. So update them, and remove
code that has been disabled for years (it has no use anyway).
 1.185 25-Nov-2017  kre Attempt to restore v6 networking. Not 100% certain that these
changes are all that is needed, but they're certainly a big part of it
(especially the ip6_input.c change.)
 1.184 24-Nov-2017  roy Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
 1.183 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.182 27-Sep-2017  ozaki-r Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
 1.181 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.180 06-Jul-2017  christos remove unnecessary casts; use sizeof(var) instead of sizeof(type).
 1.179 06-Jul-2017  christos Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
 1.178 01-Jun-2017  chs branches: 1.178.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.177 14-Mar-2017  ozaki-r Replace DIAGNOSTIC + panic with KASSERT
 1.176 01-Mar-2017  ozaki-r Provide in6_multi_group

Use it when checking if we belong to the group, instead of in6_lookup_multi.

No functional change.
 1.175 22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.174 21-Feb-2017  ozaki-r Sweep unnecessary malloc.h inclusions
 1.173 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.172 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.171 08-Dec-2016  ozaki-r branches: 1.171.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.170 01-Nov-2016  ozaki-r Reduce the number of return points

No functional change.
 1.169 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.168 07-Sep-2016  roy Disallow input to detached addresses because they are not yet valid.
 1.167 31-Aug-2016  ozaki-r Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@
 1.166 02-Aug-2016  knakahara ip6flow refactor like ipflow.

- move ip6flow sysctls into ip6_flow.c like ip_flow.c:r1.64
- build ip6_flow.c only if GATEWAY kernel option is enabled
 1.165 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.164 07-Jul-2016  ozaki-r branches: 1.164.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.163 06-Jul-2016  ozaki-r Move in6_ifaddr_list to a more proper place (from ip6_input.c to in6.c)

It's a similar place as the IPv4 address list, i.e., in.c.

More varibles will join together.
 1.162 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.161 22-Jun-2016  ozaki-r Remove unnecessary NULL checks of ifa->ifa_addr

If it's NULL, it should be a bug. There many IFADDR_FOREACH that don't do
NULL check. If it can be NULL, they should fire already.
 1.160 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.159 19-May-2016  ozaki-r Get rcvif once and reuse it

No functional change.
 1.158 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.157 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.156 01-Apr-2016  ozaki-r Tidy up nd6_timer initialization
 1.155 04-Feb-2016  riastradh Declare in6_tmpaddrtimer_ch in in6_var.h.

Do not declare extern variables in .c files!
 1.154 08-Jan-2016  knakahara eliminate ip_input.c and ip6_input.c dependency on gif(4)
 1.153 12-Dec-2015  christos Hook up the addrctl stuff that's already there.
 1.152 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.151 01-Apr-2015  ozaki-r Pull out ipsec routines from ip6_input

This change reduces symbol references from netinet6 to netipsec
and improves modularity of netipsec.

No functional change is intended.
 1.150 20-Jan-2015  roy Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.149 16-Jun-2014  ozaki-r branches: 1.149.2; 1.149.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@
 1.148 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.147 05-Jun-2014  roy Add IPV6CTL_AUTO_LINKLOCAL and ND6_IFF_AUTO_LINKLOCAL toggles which
control the automatic creation of IPv6 link-local addresses when an
interface is brought up.

Taken from FreeBSD.
 1.146 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.145 25-Feb-2014  pooka branches: 1.145.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.144 04-Oct-2013  christos check result of setscope, from logan.
 1.143 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.142 05-Jun-2013  christos branches: 1.142.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.141 29-Nov-2012  christos Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.
 1.140 25-Jun-2012  christos branches: 1.140.2;
rename rfc6056 -> portalgo, requested by yamt
 1.139 23-Jun-2012  christos 4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.138 22-Jun-2012  christos PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.137 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.136 10-Jan-2012  drochner branches: 1.136.2; 1.136.6; 1.136.8;
add patch from Arnaud Degroote to handle IPv6 extended options with
(FAST_)IPSEC, tested lightly with a DSTOPTS header consisting
of PAD1
 1.135 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.134 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.133 19-Nov-2011  tls branches: 1.133.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.132 01-Jun-2011  dyoung branches: 1.132.2;
Don't refer to extern tcbtable here, it is unused.
 1.131 24-May-2011  spz branches: 1.131.2;
RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.130 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.129 04-Feb-2010  joerg branches: 1.129.4; 1.129.6;
Explicitly include opt_gateway.h when depending on GATEWAY.
 1.128 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.127 01-May-2009  martin Add missing paranthesis - from Kurt Lidl in PR port-vax/41316
 1.126 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.125 18-Mar-2009  cegger bcopy -> memcpy
 1.124 18-Mar-2009  cegger bzero -> memset
 1.123 19-Jan-2009  christos branches: 1.123.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.
 1.122 21-Aug-2008  matt branches: 1.122.2; 1.122.4; 1.122.8;
Change KERNEL_LOCK_ONE (wrong name) to KERNEL_LOCK (the right name).
 1.121 20-Aug-2008  simonb Fix 8-spaces-vs-tab goop.
 1.120 20-Aug-2008  matt Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.
 1.119 04-May-2008  thorpej branches: 1.119.2; 1.119.6;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.118 24-Apr-2008  ad branches: 1.118.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.117 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.116 15-Apr-2008  thorpej branches: 1.116.2;
Make ip6 and icmp6 stats per-cpu.
 1.115 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.114 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.113 04-Dec-2007  dyoung branches: 1.113.8; 1.113.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.112 29-Oct-2007  dyoung branches: 1.112.2; 1.112.4;
The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.
 1.111 24-Oct-2007  dyoung Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.110 11-Sep-2007  degroote branches: 1.110.4;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800
 1.109 19-Jul-2007  dyoung branches: 1.109.4; 1.109.6; 1.109.8;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.108 09-Jul-2007  ad branches: 1.108.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.107 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.106 17-May-2007  yamt remove net.inet6.ip6.rht0 sysctl.
it's too dangerous compared to its benefit.

strongly requested by itojun@. ok'ed by core@.
 1.105 06-May-2007  dyoung Use rtcache_lookup2(), and fix cache hit/miss accounting.

While I am here, introduce an rtentry pointer, 'rt', and set it
equal to ip6_forward.ro_rt. Replace several occurrences of
'ip6_forward.ro_rt' with 'rt'.
 1.104 05-May-2007  yamt from kame:

> Revision 1.371
> Thu May 3 22:07:39 2007 UTC (47 hours, 7 minutes ago) by itojun
>
> drop packets with more than 1 routing headers.
> from claudio@openbsd

(and increment ifs6_in_hdrerr on ip6s_toomanyhdr.)
 1.103 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.102 22-Apr-2007  christos Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).

Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.101 24-Mar-2007  liamjfoy Minor change - be a little more consistant in sysctl handlers names
 1.100 24-Mar-2007  liamjfoy Don't call ip*flow_reap if we're just looking up maxflows
 1.99 23-Mar-2007  liamjfoy Add a new sysctl net.inet6.ip6.hashsize to control the hash table size.

The sysctl handler will ensure this value is a power of 2

ok dyoung@
 1.98 07-Mar-2007  liamjfoy branches: 1.98.2; 1.98.4; 1.98.6;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.97 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.96 22-Feb-2007  dyoung Cosmetic: use __arraycount. In ip6_input, move type of parameter
into parentheses.
 1.95 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.94 10-Feb-2007  degroote branches: 1.94.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.93 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.92 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.91 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.90 16-Nov-2006  christos branches: 1.90.2;
__unused removal on arguments; approved by core.
 1.89 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.88 25-May-2006  bouyer branches: 1.88.6; 1.88.8;
Make the mbuf writable before calling in6_clearscope(). Based on patch sent
by David Young on tech-kern.
 1.87 23-May-2006  rpaulo In ip6_savecontrol(), ignore IPv4 packets.
From JINMEI Tatuya (KAME). Should fix PR 33269.
 1.86 07-May-2006  rpaulo branches: 1.86.2;
while (1) -> for (;;)
 1.85 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.84 15-Apr-2006  christos Coverity CID 856: m cannot be NULL here. Remove bogus test.
 1.83 05-Mar-2006  rpaulo branches: 1.83.2; 1.83.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.82 23-Jan-2006  yamt branches: 1.82.2; 1.82.4; 1.82.6;
ip6_input: don't embed scope id before running packet filters.
 1.81 21-Jan-2006  rpaulo Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.80 11-Dec-2005  christos branches: 1.80.2;
merge ktrace-lwp.
 1.79 28-Aug-2005  rpaulo Implement net.inet6.ip6.stats sysctl.

Reviewed by Elad Efrat.
 1.78 29-May-2005  christos branches: 1.78.2;
- avoid shadowed variables
- sprinkle const.
 1.77 04-Dec-2004  peter branches: 1.77.10; 1.77.12;
Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.76 28-Nov-2004  christos We don't need to include bpfilter.h
 1.75 01-Jun-2004  itojun there's no use to check privs on curproc in the input path. jinmei@kame
 1.74 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.73 24-Mar-2004  atatat branches: 1.73.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.72 11-Feb-2004  itojun minor KNF
 1.71 11-Feb-2004  itojun KNF
 1.70 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.69 12-Nov-2003  itojun implement net.inet6.ifq
 1.68 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.67 14-Oct-2003  itojun fix endian bug in fragment header scanning.
 1.66 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.65 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.64 30-Jun-2003  itojun branches: 1.64.2;
avoid ICMPv6 redirect if the packet filter rewrite dst addr to an address
on the incoming interface. cedric@openbsd
 1.63 14-May-2003  itojun KNF
 1.62 14-May-2003  itojun do not use m_pulldown() to parse intermediate extension headers (like routing).
we don't want to drop packets due to extension header parsing. KAME rev 1.59.
(performance may suck, but it is slowpath anyways)
 1.61 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.60 20-Jan-2003  simonb The Double-Semi-Colon Police.
 1.59 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.58 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.57 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.56 09-Jun-2002  itojun whitespace cleanup
 1.55 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.54 28-May-2002  itojun use arc4random() where possible.
XXX is it necessary to do microtime() on tcp syn cache?
 1.53 28-May-2002  itojun limit number of IPv6 fragments (not the fragment queue size) to
fight against lots-of-frags DoS attacks. sync w/kame
 1.52 12-May-2002  wiz branches: 1.52.2; 1.52.4;
Spelling fixes, from Sergey Svishchev in kern/16650.
 1.51 22-Dec-2001  itojun make it compile even if NGIF=0
 1.50 21-Dec-2001  itojun move in6_gif_hlim decl to in6_gif.c. sync with kame
 1.49 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.48 13-Nov-2001  lukem add RCSIDs
 1.47 02-Nov-2001  itojun check offset overrun in ip6_nexthdr.
 1.46 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.45 24-Oct-2001  itojun more whitespace sync with kame
 1.44 16-Oct-2001  itojun branches: 1.44.2;
more whitespace/comment sync with kame
 1.43 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.42 06-Aug-2001  itojun cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.41 13-Apr-2001  thorpej branches: 1.41.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.40 30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.39 21-Mar-2001  itojun do not inject packets to ipfilter, if the packet went through IPsec tunnel.
http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction
 1.38 16-Mar-2001  itojun drop packets with link-local addresses,
if (internally-used) interface ID portion is already filled. sync with kame
 1.37 01-Mar-2001  itojun branches: 1.37.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.36 24-Feb-2001  cgd C requires that labels be followed by statements.
 1.35 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.34 07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.33 28-Dec-2000  thorpej Back out the sledgehammer damage applied by wiz while I was out for
the holiday.
 1.32 25-Dec-2000  wiz Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.
 1.31 22-Dec-2000  thorpej Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.
 1.30 14-Dec-2000  thorpej Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.
 1.29 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.28 23-Oct-2000  itojun make IFA_STATS really work on IPv6.
 1.27 31-Aug-2000  itojun add missing \n on log(). sync with kame
 1.26 26-Aug-2000  itojun implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
 1.25 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.24 02-Jul-2000  itojun drop packet to tentative/duplicated interface address earlier. sync w/kame
 1.23 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.22 13-Jun-2000  itojun branches: 1.22.2;
do not use cached route if the route becomes !RTF_UP.
make the validation for jumbo payload option more strict.
 1.21 19-May-2000  itojun branches: 1.21.2;
correct manipulation of link-local scoped address on loopback.
now "telnet fe80::1%lo0" should work again.
(we have another bug near here - will attack it soon)
 1.20 12-Apr-2000  itojun revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.19 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.18 21-Mar-2000  itojun cleanup AH/policy processing.
- parse IPv6 header by using common function, ip6_{last,next}hdr.
- fix behaivior in multiple AH cases.
make strict boundary checks on mbuf chasing.
(sync with latest kame)
 1.17 21-Mar-2000  itojun #if 0'ed too strong sanity check against packets with v4 compatible addresses.
we may want to re-enable it whenever mech-xx clarifies router behavior
against native IPv6 packet with IPv4 compatible addresses.
 1.16 20-Feb-2000  darrenr pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".
 1.15 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.14 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.13 31-Jan-2000  itojun be proactive about malicious packet on the wire. we fear that v4 mapped
address to be used as a tool to hose security filters (like bypassing
"local host only" filter by using ::ffff:127.0.0.1).
 1.12 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.11 06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.10 06-Jan-2000  itojun add missing net.inet6.ip6.rr_prune case.
 1.9 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.8 01-Oct-1999  itojun branches: 1.8.2; 1.8.8;
sanity check against truncated extension headers.
 1.7 07-Aug-1999  itojun remove invalid initialization if in6_iflladdr.
 1.6 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.5 22-Jul-1999  itojun change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.4 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6_input.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6_input.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.8.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.2.7 21-Apr-2001  bouyer Sync with HEAD
 1.8.2.6 27-Mar-2001  bouyer Sync with HEAD.
 1.8.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.8.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.8.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.8.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.22.2.6 07-Apr-2004  jmc Pullup rev 1.67 (requested by itojun in ticket #103)

Fix endian bug in fragment header scanning.
 1.22.2.5 26-Feb-2002  he Apply patch (requested by martti):
Fix it so that IPFilter handles IPv6 traffic.
 1.22.2.4 06-Apr-2001  he Pull up revision 1.39 (via patch, requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.22.2.3 11-Mar-2001  he Pull up revision 1.37 (requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.22.2.2 27-Aug-2000  itojun pullup (approved by releng-1-5)

> implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.

> cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3
> cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8
> cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h
> cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c
> cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c
> cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c
> cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
 1.22.2.1 03-Jul-2000  thorpej Pull up rev. 1.24:
drop packet to tentative/duplicated interface address earlier. sync w/kame
 1.37.2.14 18-Oct-2002  nathanw Catch up to -current.
 1.37.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.37.2.12 01-Aug-2002  nathanw Catch up to -current.
 1.37.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.37.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.37.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.37.2.8 08-Jan-2002  nathanw Catch up to -current.
 1.37.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.37.2.6 22-Oct-2001  nathanw Catch up to -current.
 1.37.2.5 24-Aug-2001  nathanw Catch up with -current.
 1.37.2.4 21-Jun-2001  nathanw Catch up to -current.
 1.37.2.3 09-Apr-2001  nathanw Catch up with -current.
 1.37.2.2 13-Mar-2001  nathanw Be more careful not to dereference curproc when there might not be
a process context.
 1.37.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.41.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.41.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.41.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.41.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.41.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.44.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.52.4.1 19-Oct-2003  tron Pull up revision 1.67 (requested by itojun in ticket #1525):
fix endian bug in fragment header scanning.
 1.52.2.3 15-Jul-2002  gehenna catch up with -current.
 1.52.2.2 20-Jun-2002  gehenna catch up with -current.
 1.52.2.1 30-May-2002  gehenna Catch up with -current.
 1.64.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.64.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.64.2.4 29-Nov-2004  skrll Sync with HEAD.
 1.64.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.64.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.64.2.1 03-Aug-2004  skrll Sync with HEAD
 1.73.2.3 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.73.2.2 24-May-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #10626):
sys/netinet6/ip6_input.c: revision 1.87
In ip6_savecontrol(), ignore IPv4 packets.
From JINMEI Tatuya (KAME). Should fix PR 33269.
 1.73.2.1 28-May-2004  tron branches: 1.73.2.1.2; 1.73.2.1.4;
Pull up revision 1.74 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.73.2.1.4.3 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.73.2.1.4.2 28-May-2006  riz Repair a patching error from previous revision (ticket #10626)
 1.73.2.1.4.1 24-May-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #10626):
sys/netinet6/ip6_input.c: revision 1.87
In ip6_savecontrol(), ignore IPv4 packets.
From JINMEI Tatuya (KAME). Should fix PR 33269.
 1.73.2.1.2.3 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.73.2.1.2.2 28-May-2006  riz Repair a patching error from previous revision (ticket #10626)
 1.73.2.1.2.1 24-May-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #10626):
sys/netinet6/ip6_input.c: revision 1.87
In ip6_savecontrol(), ignore IPv4 packets.
From JINMEI Tatuya (KAME). Should fix PR 33269.
 1.77.12.2 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.77.12.1 24-May-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #1338):
sys/netinet6/ip6_input.c: revision 1.87 via patch
In ip6_savecontrol(), ignore IPv4 packets.
From JINMEI Tatuya (KAME). Should fix PR 33269.
 1.77.10.2 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.77.10.1 24-May-2006  riz branches: 1.77.10.1.2;
Pull up following revision(s) (requested by rpaulo in ticket #1338):
sys/netinet6/ip6_input.c: revision 1.87 via patch
In ip6_savecontrol(), ignore IPv4 packets.
From JINMEI Tatuya (KAME). Should fix PR 33269.
 1.77.10.1.2.1 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.78.2.8 17-Mar-2008  yamt sync with head.
 1.78.2.7 07-Dec-2007  yamt sync with head
 1.78.2.6 15-Nov-2007  yamt sync with head.
 1.78.2.5 27-Oct-2007  yamt sync with head.
 1.78.2.4 03-Sep-2007  yamt sync with head.
 1.78.2.3 26-Feb-2007  yamt sync with head.
 1.78.2.2 30-Dec-2006  yamt sync with head.
 1.78.2.1 21-Jun-2006  yamt sync with head.
 1.80.2.1 01-Feb-2006  yamt sync with head.
 1.82.6.3 26-Jun-2006  yamt sync with head.
 1.82.6.2 24-May-2006  yamt sync with head.
 1.82.6.1 13-Mar-2006  yamt sync with head.
 1.82.4.2 01-Jun-2006  kardel Sync with head.
 1.82.4.1 22-Apr-2006  simonb Sync with head.
 1.82.2.3 09-Sep-2006  rpaulo sync with head
 1.82.2.2 23-Feb-2006  rpaulo ip6_savecontrol(): remove references to in6pcb.
 1.82.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.83.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.83.2.2 11-May-2006  elad sync with head
 1.83.2.1 19-Apr-2006  elad sync with head.
 1.86.2.1 19-Jun-2006  chap Sync with head.
 1.88.8.3 18-Dec-2006  yamt sync with head.
 1.88.8.2 10-Dec-2006  yamt sync with head.
 1.88.8.1 22-Oct-2006  yamt sync with head
 1.88.6.2 12-Jan-2007  ad Sync with head.
 1.88.6.1 18-Nov-2006  ad Sync with head.
 1.90.2.3 16-Sep-2007  xtraeme Pull up following revision(s) (requested by degroote in ticket #881):
sys/netinet/ip_input.c: revision 1.253
sys/netinet6/ip6_input.c: revision 1.110

In some FAST_IPSEC, spl level is not restored correctly. Fix that.
Spotted by Wolfgang Stukenbrock in pr/36800
 1.90.2.2 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.90.2.1 28-Apr-2007  bouyer branches: 1.90.2.1.2;
Pull up following revision(s) (requested by christos in ticket #587):
sys/netinet6/ip6_input.c: revision 1.102
sys/netinet6/route6.c: revision 1.18
sys/netinet6/ip6_var.h: revision 1.41
sys/netinet6/ip6_var.h: revision 1.42
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.90.2.1.2.2 23-Sep-2007  wrstuden Sync with somewhat-recent netbsd-4.
 1.90.2.1.2.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.94.2.5 17-May-2007  yamt sync with head.
 1.94.2.4 07-May-2007  yamt sync with head.
 1.94.2.3 24-Mar-2007  yamt sync with head.
 1.94.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.94.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.98.6.1 29-Mar-2007  reinoud Pullup to -current
 1.98.4.1 11-Jul-2007  mjf Sync with head.
 1.98.2.6 09-Oct-2007  ad Sync with head.
 1.98.2.5 20-Aug-2007  ad Sync with HEAD.
 1.98.2.4 02-Jul-2007  yamt - ip6_init: fix a mistake in rev.1.98.2.3 which makes
callout_softclock jump to NULL.
- s/struct callout/callout_t/
 1.98.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.98.2.2 08-Jun-2007  ad Sync with head.
 1.98.2.1 10-Apr-2007  ad Sync with head.
 1.108.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.109.8.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.109.8.1 19-Jul-2007  dyoung file ip6_input.c was added on branch matt-mips64 on 2007-07-19 20:48:57 +0000
 1.109.6.3 23-Mar-2008  matt sync with HEAD
 1.109.6.2 09-Jan-2008  matt sync with HEAD
 1.109.6.1 06-Nov-2007  matt sync with HEAD
 1.109.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.109.4.3 31-Oct-2007  joerg Sync with HEAD.
 1.109.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.109.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.110.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.112.4.1 08-Dec-2007  ad Sync with head.
 1.112.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.113.12.3 28-Sep-2008  mjf Sync with HEAD.
 1.113.12.2 02-Jun-2008  mjf Sync with HEAD.
 1.113.12.1 03-Apr-2008  mjf Sync with HEAD.
 1.113.8.2 24-Mar-2008  keiichi sync with head.
 1.113.8.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.116.2.1 18-May-2008  yamt sync with head.
 1.118.2.3 11-Mar-2010  yamt sync with head
 1.118.2.2 04-May-2009  yamt sync with head.
 1.118.2.1 16-May-2008  yamt sync with head.
 1.119.6.1 19-Oct-2008  haad Sync with HEAD.
 1.119.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.122.8.1 03-May-2009  bouyer branches: 1.122.8.1.2;
Pull up following revision(s) (requested by martin in ticket #733):
sys/netinet6/ip6_input.c: revision 1.127
Add missing paranthesis - from Kurt Lidl in PR port-vax/41316
 1.122.8.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.122.4.1 03-May-2009  bouyer Pull up following revision(s) (requested by martin in ticket #733):
sys/netinet6/ip6_input.c: revision 1.127
Add missing paranthesis - from Kurt Lidl in PR port-vax/41316
 1.122.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.122.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.123.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.129.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.129.4.2 12-Jun-2011  rmind sync with head
 1.129.4.1 31-May-2011  rmind sync with head
 1.131.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.132.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.132.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.132.2.2 30-Oct-2012  yamt sync with head
 1.132.2.1 17-Apr-2012  yamt sync with head
 1.133.2.2 05-Apr-2012  mrg sync to latest -current.
 1.133.2.1 18-Feb-2012  mrg merge to -current.
 1.136.8.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.136.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.136.6.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.136.6.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.136.2.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.136.2.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.140.2.4 03-Dec-2017  jdolecek update from HEAD
 1.140.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.140.2.2 23-Jun-2013  tls resync from head
 1.140.2.1 25-Feb-2013  tls resync with head
 1.142.2.2 18-May-2014  rmind sync with head
 1.142.2.1 28-Aug-2013  rmind sync with head
 1.145.2.1 10-Aug-2014  tls Rebase.
 1.149.4.11 28-Aug-2017  skrll Sync with HEAD
 1.149.4.10 05-Feb-2017  skrll Sync with HEAD
 1.149.4.9 05-Dec-2016  skrll Sync with HEAD
 1.149.4.8 05-Oct-2016  skrll Sync with HEAD
 1.149.4.7 09-Jul-2016  skrll Sync with HEAD
 1.149.4.6 29-May-2016  skrll Sync with HEAD
 1.149.4.5 22-Apr-2016  skrll Sync with HEAD
 1.149.4.4 19-Mar-2016  skrll Sync with HEAD
 1.149.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.149.4.2 22-Sep-2015  skrll Sync with HEAD
 1.149.4.1 06-Apr-2015  skrll Sync with HEAD
 1.149.2.4 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1708):

sys/netinet6/ip6_input.c: revision 1.209 via patch
sys/netinet/ip_input.c: revision 1.390 via patch

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.149.2.3 25-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1572):
sys/netinet6/ip6_input.c: 1.188 via patch
Kick nested fragments.
 1.149.2.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.149.2.1 23-Jan-2015  martin branches: 1.149.2.1.2; 1.149.2.1.6;
Pull up following revision(s) (requested by pettai in ticket #441):
sys/netinet6/ip6_var.h: revision 1.64
sys/netinet6/in6.h: revision 1.82
sys/netinet6/in6_src.c: revision 1.56
sys/netinet6/mld6.c: revision 1.62
sys/netinet6/ip6_input.c: revision 1.150
sys/netinet6/ip6_output.c: revision 1.161
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.
Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.149.2.1.6.3 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1708):

sys/netinet6/ip6_input.c: revision 1.209 via patch
sys/netinet/ip_input.c: revision 1.390 via patch

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.149.2.1.6.2 25-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1572):
sys/netinet6/ip6_input.c: 1.188 via patch
Kick nested fragments.
 1.149.2.1.6.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.149.2.1.2.3 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1708):

sys/netinet6/ip6_input.c: revision 1.209 via patch
sys/netinet/ip_input.c: revision 1.390 via patch

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.149.2.1.2.2 25-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1572):
sys/netinet6/ip6_input.c: 1.188 via patch
Kick nested fragments.
 1.149.2.1.2.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.164.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.164.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.164.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.164.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.171.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.178.2.9 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.178.2.8 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1378):

sys/netinet6/ip6_input.c: revision 1.209 (patch)
sys/netinet/ip_input.c: revision 1.390 (patch)

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.178.2.7 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.178.2.6 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #588):
sys/netinet6/in6.c: revision 1.260
sys/netinet/in.c: revision 1.219
sys/netinet/wqinput.c: revision 1.4
sys/rump/net/lib/libnetinet/netinet_component.c: revision 1.11
sys/netinet/ip_input.c: revision 1.376
sys/netinet6/ip6_input.c: revision 1.193
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.178.2.5 26-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #568):
sys/netinet6/ip6_input.c: 1.188
Kick nested fragments.
 1.178.2.4 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #527):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.178.2.3 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.178.2.2 10-Dec-2017  snj Pull up following revision(s) (requested by roy in ticket #390):
sys/netinet/ip_input.c: 1.363
sys/netinet6/ip6_input.c: 1.184-1.185
sys/netinet6/ip6_output.c: 1.194-1.195
sys/netinet6/in6_src.c: 1.83-1.84
Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
--
Attempt to restore v6 networking. Not 100% certain that these
changes are all that is needed, but they're certainly a big part of it
(especially the ip6_input.c change.)
--
Treat unvalidated addresses as deprecated in rule 3.
 1.178.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.193.2.7 18-Jan-2019  pgoyette Synch with HEAD
 1.193.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.193.2.5 21-May-2018  pgoyette Sync with HEAD
 1.193.2.4 02-May-2018  pgoyette Synch with HEAD
 1.193.2.3 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.193.2.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.193.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.204.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.204.2.1 10-Jun-2019  christos Sync with HEAD
 1.208.2.4 16-Nov-2019  martin Pull up following revision(s) (requested by maxv in ticket #432):

sys/netinet6/ip6_input.c: revision 1.215

Add more checks in ip6_pullexthdr, to prevent a panic in m_copydata. The
Rip6 entry point could see a garbage Hop6 option.

Not a big issue, since it's a clean panic only triggerable if the socket
has the IN6P_DSTOPTS/IN6P_RTHDR option.
 1.208.2.3 23-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #368):

sys/netinet6/in6_ifattach.h: revision 1.14
sys/netinet6/ip6_input.c: revision 1.212
sys/netinet6/ip6_input.c: revision 1.213
sys/netinet6/ip6_input.c: revision 1.214
sys/netinet6/in6_var.h: revision 1.101
sys/netinet6/in6_var.h: revision 1.102
sys/netinet6/in6_ifattach.c: revision 1.116
sys/netinet6/in6_ifattach.c: revision 1.117
tests/net/ndp/t_ra.sh: revision 1.33

Reorganize in6_tmpaddrtimer stuffs
- Move the related functions to where in6_tmpaddrtimer_ch exists
- Hide global variable in6_tmpaddrtimer_ch
- Rename ip6_init2 to in6_tmpaddrtimer_init
- Reduce callers of callout_reset
- Use callout_schedule

Validate ip6_temp_preferred_lifetime (net.inet6.ip6.temppltime) on a change
ip6_temp_preferred_lifetime is used to calculate an interval period to
regenerate temporary addresse by
TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR
as per RFC 3041 3.5. So it must be greater than (REGEN_ADVANCE +
DESYNC_FACTOR), otherwise it will be negative and go wrong, for example
KASSERT(to_ticks >= 0) in callout_schedule_locked fails.

tests: add tests for the validateion of net.inet6.ip6.temppltime

in6: reset the temporary address timer on a change of the interval period
 1.208.2.2 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.208.2.1 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #208):

sys/netinet6/ip6_input.c: revision 1.209
sys/netinet/ip_input.c: revision 1.390

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.222.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.133 02-Jan-2025  andvar s/muliticasted/multicasted/ in comment.
 1.132 12-Jun-2020  roy branches: 1.132.26;
Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.131 03-Jan-2020  maxv Don't forget to initialize 'sin6_len'. With kASan, from time to time the
value will be bigger than the size of the source, and we get a read
overflow. With kMSan the uninitialized access is detected immediately.

Reported-by: syzbot+841ca14baccec37b4f8f@syzkaller.appspotmail.com
 1.130 24-Jul-2019  msaitoh branches: 1.130.2;
Fix typo in comment (s/alreay/already/).
 1.129 21-Jun-2018  knakahara branches: 1.129.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.128 20-May-2018  maxv Remove notyet, we've never had this.
 1.127 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.126 29-Apr-2018  maxv Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.
 1.125 26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.124 26-Apr-2018  maxv Use M_UNWRITABLE, no functional change.
 1.123 21-Mar-2018  roy Sprinkle more soroverflow().
 1.122 06-Feb-2018  maxv branches: 1.122.2;
Remove dead code.
 1.121 02-Feb-2018  maxv Style, no functional change.
 1.120 02-Feb-2018  maxv Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.119 01-Mar-2017  ozaki-r branches: 1.119.6;
Provide in6_multi_group

Use it when checking if we belong to the group, instead of in6_lookup_multi.

No functional change.
 1.118 22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.117 14-Feb-2017  ozaki-r Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.116 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.115 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.114 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.113 11-Jan-2017  ozaki-r branches: 1.113.2;
Get rid of unnecessary header inclusions
 1.112 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.111 21-Jun-2016  ozaki-r branches: 1.111.2;
Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.110 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.109 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.108 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.107 17-May-2014  rmind branches: 1.107.2; 1.107.4; 1.107.6; 1.107.10;
Replace open-coded access (and boundary checking) of ifindex2ifnet with
if_byindex() function.
 1.106 25-Feb-2014  pooka branches: 1.106.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.105 21-Nov-2013  riz Revert previous and solve in a different way, using __unused. Fixes
building with MRT6DEBUG.

ok martin.
 1.104 14-Sep-2013  martin Remove unused variable and ifdef some others like their use
 1.103 31-Dec-2011  christos branches: 1.103.2; 1.103.6; 1.103.8; 1.103.10; 1.103.16;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.102 19-Oct-2011  dyoung branches: 1.102.2; 1.102.6;
Use if_addr_init() and if_mcast_op() instead of ifp->if_ioctl().
 1.101 31-Aug-2011  plunky NULL does not need a cast
 1.100 14-Oct-2010  oki Fixed mbuf leak possibility.
 1.99 27-Jul-2010  jakllsch Make MRT6DEBUG compile on LP64 by using ptrdiff_t printf() format specifier.
 1.98 16-Sep-2009  pooka branches: 1.98.2; 1.98.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.97 18-Mar-2009  cegger bzero -> memset
 1.96 06-Aug-2008  plunky branches: 1.96.2; 1.96.8;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.95 24-Jun-2008  gmcgarry branches: 1.95.2;
ioctl commands are unsigned long. ABI change to mrt6_ioctl() will affect 64-bit platforms.
 1.94 22-May-2008  dyoung branches: 1.94.2;
Don't cast to void * unnecessarily.
 1.93 04-May-2008  thorpej branches: 1.93.2;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.92 24-Apr-2008  ad branches: 1.92.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.91 23-Apr-2008  thorpej Use <net/net_stats.h> / netstat_sysctl().
 1.90 15-Apr-2008  thorpej branches: 1.90.2;
Make pim6 stats per-cpu.
 1.89 15-Apr-2008  thorpej Make ip6 and icmp6 stats per-cpu.
 1.88 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.87 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.86 27-Nov-2007  christos branches: 1.86.10; 1.86.14;
require that the options argument is the right size, not that it is greater
or equal to the requested size. Suggested by Matt Thomas.
 1.85 10-Nov-2007  dyoung Use sockaddr_in6_init().
 1.84 01-Nov-2007  dyoung branches: 1.84.2;
De-__P().
 1.83 09-Jul-2007  ad branches: 1.83.6; 1.83.8; 1.83.12;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.82 23-May-2007  christos fix typos in previous
 1.81 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.80 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.79 04-Mar-2007  christos branches: 1.79.2; 1.79.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.78 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.77 29-Jan-2007  dyoung branches: 1.77.2;
Cosmetic: move an splsoftnet() call out of the variable declarations,
get rid of a gratuitous cast, change (struct socket *)0 to NULL.
 1.76 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.75 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.74 30-Aug-2006  christos branches: 1.74.2; 1.74.4;
fix initializers
 1.73 17-Aug-2006  christos Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!
 1.72 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.71 05-Mar-2006  rpaulo branches: 1.71.6;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.70 03-Mar-2006  rpaulo branches: 1.70.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.69 21-Jan-2006  rpaulo branches: 1.69.2; 1.69.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.68 11-Dec-2005  christos branches: 1.68.2;
merge ktrace-lwp.
 1.67 21-Oct-2005  bouyer mif6table is used by netstat, so don't declare it static. Fix netstat -g
on Xen, whose ELF loader doesn't load local symbols in the symbol table.
 1.66 17-Oct-2005  rpaulo branches: 1.66.2;
If we recieve a PIM register message when IPv6 PIM-SM routing is
enabled avoid a crash when forwarding the packet to outgoing interfaces.

Taken from FreeBSD which obtained it from KAME.
 1.65 28-Aug-2005  rpaulo Implement net.inet6.pim6.stats sysctl.

Reviewed by Elad Efrat.
 1.64 29-May-2005  christos branches: 1.64.2;
avoid silly static variables that even caused nesting issues, not to mention
reentrancy concerns.
 1.63 26-Feb-2005  perry branches: 1.63.2;
nuke trailing whitespace
 1.62 21-Dec-2004  drochner branches: 1.62.2; 1.62.4;
remove a redundant check for ifindex2ifnet[idx] != 0
 1.61 04-Sep-2004  manu IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.60 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.59 10-Dec-2003  itojun validate set/getsockopt arg more strictly. with previous code privileged
user can cause kernel crash.
 1.58 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.57 15-Oct-2003  itojun backout previous (ENETREST special handlng)
 1.56 15-Oct-2003  itojun ignore ENETRESET on ADDMULTI
 1.55 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.54 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.53 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.52 12-Jul-2003  itojun KNF
 1.51 12-Jul-2003  itojun no longer needed (#define _KERNEL)
 1.50 08-Jul-2003  itojun on interface detach, clear multicast forwarding table. from kame
 1.49 24-Jun-2003  itojun branches: 1.49.2;
use time.tv_sec directly
 1.48 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.47 06-Jun-2003  itojun don't try to forward multicast packet to mif that went away; kame
 1.46 16-May-2003  itojun use strlcpy
 1.45 15-May-2003  itojun check version before computing checksum. checksum is more expensive operation.
 1.44 14-May-2003  itojun KNF
 1.43 14-May-2003  itojun KNF
 1.42 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.41 27-Nov-2002  itojun recover original stanford copyright. sync w/kame
 1.40 09-Nov-2002  itojun need icmp6.h for MULTICAST_PMTUD case. sync w/kame
 1.39 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.38 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.37 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.36 25-Jul-2002  itojun correct multicast packet MTU check. sync w/kame
 1.35 29-Jun-2002  itojun typo in name
 1.34 08-Jun-2002  itojun whitespace cleanup
 1.33 07-Jun-2002  itojun typo
 1.32 07-Jun-2002  itojun typo
 1.31 07-Jun-2002  itojun 'fall through' is not a valid LINT keyword.
 1.30 30-May-2002  itojun use M_READONLY where possible. minor cleanup/sync with kame.
 1.29 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.28 14-May-2002  matt branches: 1.28.2; 1.28.4;
Eliminate more commons or redundant declarations.
 1.27 24-Mar-2002  itojun double m_free() - niklas@openbsd
 1.26 04-Mar-2002  sommerfeld Nuke out-of-synch comment.
 1.25 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.24 13-Nov-2001  lukem add RCSIDs
 1.23 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.22 25-Jul-2001  itojun ifindex2ifnet could return NULL if if_detach() is used (pcmcia card
removal and such).
 1.21 22-Jul-2001  wiz seperate -> separate
 1.20 25-Mar-2001  itojun branches: 1.20.2;
couple of missing splx. sync with kame.
From: csapuntz@play-doh.stanford.edu (Constantine Sapuntzakis)
 1.19 08-Mar-2001  itojun more missing splx. from kame
 1.18 07-Mar-2001  itojun missing splx. from aaron@openbsd. sync with kame
 1.17 11-Feb-2001  itojun branches: 1.17.2;
remove #ifdef __FreeBSD__.
 1.16 10-Feb-2001  itojun initialize "mbz" member. kame 1.35 -> 1.36
 1.15 10-Feb-2001  itojun cosmetic changes to sync with kame. tabify and minor local variable renames
 1.14 19-Oct-2000  itojun kame 1.32 -> 1.33
in add_m6fc(), set interface list for all cases.
in response to a report from Hoerdt Mickael.

kame 1.31 -> 1.32
discard PIM register if the version of the inner packet is incorrect (i.e. IPv6)
(according to clarfication of recent discussion in the IETF pim ML)
 1.13 29-Aug-2000  itojun do not forward packets with unspecified source address (::).
this is clarification recently made to RFC2460. sync with kame.
 1.12 19-May-2000  itojun branches: 1.12.4;
correct MLD API. (binary backward compatibility is kept)
commit to usr.sbin/pim6* will follow.
 1.11 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.10 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.9 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.8 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.7 22-Jul-1999  itojun branches: 1.7.2; 1.7.8;
avoid u_long and hardcoded numbers.
 1.6 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.5 06-Jul-1999  itojun checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6_mroute.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6_mroute.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.7.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.7.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.7.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.7.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.5 05-Aug-2003  msaitoh Pull up revision 1.50 via patch (requested by itojun in ticket #57):
on interface detach, clear multicast forwarding table.
 1.12.4.4 17-Jun-2003  msaitoh Pull up revisions 1.47 (requested by itojun in ticket #47):
don't try to forward multicast packet to mif that went away.
 1.12.4.3 04-Sep-2002  itojun pullup 1.36 (itojun)

correct multicast packet MTU check. sync w/kame
 1.12.4.2 19-Oct-2000  he Pull up revision 1.14 (requested by itojun):
KAME 1.32: discard PIM register with wrong version#.
KAME 1.33: in add_m6fc(), always set interface list.
 1.12.4.1 29-Aug-2000  itojun pullup 1.12 -> 1.13 (approved by releng-1-5)

> do not forward packets with unspecified source address (::).
> this is clarification recently made to RFC2460. sync with kame.
 1.17.2.12 11-Dec-2002  thorpej Sync with HEAD.
 1.17.2.11 11-Nov-2002  nathanw Catch up to -current
 1.17.2.10 18-Oct-2002  nathanw Catch up to -current.
 1.17.2.9 17-Sep-2002  nathanw Catch up to -current.
 1.17.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.17.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.17.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.17.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.17.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.17.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.20.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.20.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.20.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.20.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.20.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.1 03-Aug-2001  lukem update to -current
 1.28.4.4 01-Sep-2003  tron Fix build problem caused by ticket #1375.
 1.28.4.3 01-Sep-2003  tron Pull up revision 1.50 (requested by itojun in ticket #1375):
on interface detach, clear multicast forwarding table. from kame
 1.28.4.2 17-Jun-2003  msaitoh Pull up revisions 1.47 (requested by itojun in ticket #1317):
don't try to forward multicast packet to mif that went away.
 1.28.4.1 29-Jul-2002  lukem Pull up revision 1.36 via patch (requested by itojun in ticket #543):
correct multicast packet MTU check. sync w/kame
 1.28.2.4 29-Aug-2002  gehenna catch up with -current.
 1.28.2.3 15-Jul-2002  gehenna catch up with -current.
 1.28.2.2 20-Jun-2002  gehenna catch up with -current.
 1.28.2.1 30-May-2002  gehenna Catch up with -current.
 1.49.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.49.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.49.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.49.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.49.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.49.2.1 03-Aug-2004  skrll Sync with HEAD
 1.62.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.62.2.1 29-Apr-2005  kent sync with -current
 1.63.2.1 22-Oct-2005  riz Pull up following revision(s) (requested by bouyer in ticket #910):
sys/netinet6/ip6_mroute.c: revision 1.67
mif6table is used by netstat, so don't declare it static. Fix netstat -g
on Xen, whose ELF loader doesn't load local symbols in the symbol table.
 1.64.2.7 17-Mar-2008  yamt sync with head.
 1.64.2.6 07-Dec-2007  yamt sync with head
 1.64.2.5 15-Nov-2007  yamt sync with head.
 1.64.2.4 03-Sep-2007  yamt sync with head.
 1.64.2.3 26-Feb-2007  yamt sync with head.
 1.64.2.2 30-Dec-2006  yamt sync with head.
 1.64.2.1 21-Jun-2006  yamt sync with head.
 1.66.2.1 26-Oct-2005  yamt sync with head
 1.68.2.1 01-Feb-2006  yamt sync with head.
 1.69.4.2 22-Apr-2006  simonb Sync with head.
 1.69.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.69.2.1 09-Sep-2006  rpaulo sync with head
 1.70.2.3 03-Sep-2006  yamt sync with head.
 1.70.2.2 26-Jun-2006  yamt sync with head.
 1.70.2.1 13-Mar-2006  yamt sync with head.
 1.71.6.1 19-Jun-2006  chap Sync with head.
 1.74.4.2 10-Dec-2006  yamt sync with head.
 1.74.4.1 22-Oct-2006  yamt sync with head
 1.74.2.2 01-Feb-2007  ad Sync with head.
 1.74.2.1 18-Nov-2006  ad Sync with head.
 1.77.2.3 07-May-2007  yamt sync with head.
 1.77.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.77.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.79.4.1 11-Jul-2007  mjf Sync with head.
 1.79.2.2 01-Jul-2007  ad Adapt to callout API change.
 1.79.2.1 08-Jun-2007  ad Sync with head.
 1.83.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.83.8.3 23-Mar-2008  matt sync with HEAD
 1.83.8.2 09-Jan-2008  matt sync with HEAD
 1.83.8.1 06-Nov-2007  matt sync with HEAD
 1.83.6.3 03-Dec-2007  joerg Sync with HEAD.
 1.83.6.2 11-Nov-2007  joerg Sync with HEAD.
 1.83.6.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.84.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.84.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.86.14.4 28-Sep-2008  mjf Sync with HEAD.
 1.86.14.3 29-Jun-2008  mjf Sync with HEAD.
 1.86.14.2 02-Jun-2008  mjf Sync with HEAD.
 1.86.14.1 03-Apr-2008  mjf Sync with HEAD.
 1.86.10.1 24-Mar-2008  keiichi sync with head.
 1.90.2.2 04-Jun-2008  yamt sync with head
 1.90.2.1 18-May-2008  yamt sync with head.
 1.92.2.4 11-Aug-2010  yamt sync with head.
 1.92.2.3 11-Mar-2010  yamt sync with head
 1.92.2.2 04-May-2009  yamt sync with head.
 1.92.2.1 16-May-2008  yamt sync with head.
 1.93.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.93.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.94.2.1 27-Jun-2008  simonb Sync with head.
 1.95.2.1 19-Oct-2008  haad Sync with HEAD.
 1.96.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.96.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.98.4.1 05-Mar-2011  rmind sync with head
 1.98.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.98.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.102.6.1 18-Feb-2012  mrg merge to -current.
 1.102.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.102.2.1 17-Apr-2012  yamt sync with head
 1.103.16.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1524):
sys/netinet6/ip6_mroute.c: revision 1.120
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.103.10.2 18-May-2014  rmind sync with head
 1.103.10.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.103.8.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1524):
sys/netinet6/ip6_mroute.c: revision 1.120
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.103.6.2 03-Dec-2017  jdolecek update from HEAD
 1.103.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.103.2.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1524):
sys/netinet6/ip6_mroute.c: revision 1.120
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.106.2.1 10-Aug-2014  tls Rebase.
 1.107.10.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1561):
sys/netinet6/ip6_mroute.c: revision 1.120
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.107.6.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1561):
sys/netinet6/ip6_mroute.c: revision 1.120
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.107.4.5 28-Aug-2017  skrll Sync with HEAD
 1.107.4.4 05-Feb-2017  skrll Sync with HEAD
 1.107.4.3 05-Oct-2016  skrll Sync with HEAD
 1.107.4.2 09-Jul-2016  skrll Sync with HEAD
 1.107.4.1 22-Sep-2015  skrll Sync with HEAD
 1.107.2.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1561):
sys/netinet6/ip6_mroute.c: revision 1.120
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.111.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.111.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.113.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.119.6.3 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.119.6.2 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.119.6.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #530):
sys/netinet6/ip6_mroute.c: revision 1.120
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
 1.122.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.122.2.3 21-May-2018  pgoyette Sync with HEAD
 1.122.2.2 02-May-2018  pgoyette Synch with HEAD
 1.122.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.129.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.129.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.130.2.1 05-Jan-2020  martin Pull up following revision(s) (requested by maxv in ticket #606):

sys/netinet6/ip6_mroute.c: revision 1.131

Don't forget to initialize 'sin6_len'. With kASan, from time to time the
value will be bigger than the size of the source, and we get a read
overflow. With kMSan the uninitialized access is detected immediately.
 1.132.26.1 02-Aug-2025  perseant Sync with HEAD
 1.20 07-Aug-2022  tsutsui Remove extra whitespaces added by an ancient stupid script.
 1.19 20-May-2018  maxv Remove notyet, we've never had this.
 1.18 06-Feb-2018  maxv branches: 1.18.2;
Remove dead code.
 1.17 18-Mar-2009  cegger bcopy -> memcpy
 1.16 18-Mar-2009  cegger bzero -> memset
 1.15 06-Aug-2008  plunky branches: 1.15.2; 1.15.8;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.14 24-Jun-2008  gmcgarry branches: 1.14.2;
ioctl commands are unsigned long. ABI change to mrt6_ioctl() will affect 64-bit platforms.
 1.13 01-Nov-2007  dyoung branches: 1.13.16; 1.13.20; 1.13.22; 1.13.24;
De-__P().
 1.12 25-Apr-2007  dyoung branches: 1.12.6; 1.12.8; 1.12.12;
Remove unused member 'm6_route' from struct mif6.
 1.11 04-Mar-2007  christos branches: 1.11.2; 1.11.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.10 11-Dec-2005  christos branches: 1.10.26;
merge ktrace-lwp.
 1.9 08-Jul-2003  itojun branches: 1.9.16;
on interface detach, clear multicast forwarding table. from kame
 1.8 10-Feb-2001  itojun branches: 1.8.18; 1.8.24;
fix if_set for architectures with sizeof(long) != 4. IF_xxx behaved badly.
(no fear of overrun, since index was mistakenly computed to too small value)
 1.7 19-May-2000  itojun branches: 1.7.4;
correct MLD API. (binary backward compatibility is kept)
commit to usr.sbin/pim6* will follow.
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 02-Dec-1999  itojun use _KERNEL instead of KERNEL. (sync from KAME)
 1.4 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.3 03-Jul-1999  thorpej branches: 1.3.2; 1.3.8;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6_mroute.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6_mroute.h was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.3.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.4.1 05-Aug-2003  msaitoh Pull up revision 1.9 (requested by itojun in ticket #57):
on interface detach, clear multicast forwarding table.
 1.8.24.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.24.2 18-Sep-2004  skrll Sync with HEAD.
 1.8.24.1 03-Aug-2004  skrll Sync with HEAD
 1.8.18.1 01-Sep-2003  tron Pull up revision 1.9 (requested by itojun in ticket #1375):
on interface detach, clear multicast forwarding table. from kame
 1.9.16.2 15-Nov-2007  yamt sync with head.
 1.9.16.1 03-Sep-2007  yamt sync with head.
 1.10.26.2 07-May-2007  yamt sync with head.
 1.10.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.11.4.1 11-Jul-2007  mjf Sync with head.
 1.11.2.1 08-Jun-2007  ad Sync with head.
 1.12.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.12.8.1 06-Nov-2007  matt sync with HEAD
 1.12.6.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.13.24.1 27-Jun-2008  simonb Sync with head.
 1.13.22.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.13.20.1 04-May-2009  yamt sync with head.
 1.13.16.2 28-Sep-2008  mjf Sync with HEAD.
 1.13.16.1 29-Jun-2008  mjf Sync with HEAD.
 1.14.2.1 19-Oct-2008  haad Sync with HEAD.
 1.15.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.15.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.18.2.1 21-May-2018  pgoyette Sync with HEAD
 1.235 19-Apr-2024  riastradh ip6_output: Initialize plen for ip6_hopopts_input.

This funny little block in ip6_process_hopopts assumes it is
initialized as and behaves differently depending on whether it's zero
or not:

https://nxr.netbsd.org/xref/src/sys/netinet6/ip6_input.c?r=1.227#976

In the other call site, it is initialized to ip6->ip6_plen:

https://nxr.netbsd.org/xref/src/sys/netinet6/ip6_input.c?r=1.227#561

Reported-by: syzbot+587e3b707bdfe533283f@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?extid=587e3b707bdfe533283f
 1.234 03-Aug-2023  ozaki-r in6: don't send any IPv6 packets over a disabled interface
 1.233 20-Mar-2023  ozaki-r in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)

Same as OpenBSD.
 1.232 27-Jan-2023  ozaki-r ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.231 28-Oct-2022  ozaki-r branches: 1.231.2;
inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.230 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.229 21-Sep-2021  christos don't opencode kauth_cred_get()
 1.228 17-Aug-2021  andvar fix multiplei repetitive typos in comments, messages and documentation. mainly because copy paste code big amount of files are affected.
 1.227 10-Mar-2021  christos byte-flipping a random number is not very useful.
 1.226 08-Sep-2020  christos branches: 1.226.2;
Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.225 28-Aug-2020  ozaki-r inet6: reduce silent packet discards
 1.224 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.223 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.222 13-Nov-2019  ozaki-r Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.221 01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.220 15-May-2019  ozaki-r branches: 1.220.2;
Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
Unfortunatetly it caused a deadlock instead. A known scenario causing a
deadlock is to occur the following two operations concurrently: (a) a removal of
an IP adddres assigned to an interface and (b) a manipulation of multicast
groups to the interface. The resource dependency graph is like this:
softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

Thanks to the previous commit that avoids data races on if_flags for
IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
deadlock.

PR kern/54189
 1.219 13-May-2019  ozaki-r Count packets dropped by pfil
 1.218 03-Apr-2019  maxv Fix small read overflow; harmless, because since I removed RH0, the memory
access on IPV6_RTHDR that would normally be illegal is not needed, and GCC
automatically removes it.
 1.217 04-Feb-2019  mrg rework the #ifdef IPSEC code to not use fallthru.
same number of lines with more local context.
 1.216 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.215 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.214 12-Dec-2018  rin Simplify logic in ip{,6}_output().

Now, we have M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx when
TSO[46] is enabled for the interface. So we can simply check
whether TSO[46] is required in a packet but missing in the
interface by (sw_csum & M_CSUM_TSOv[46]).

Note that this is a very rare case where TSO[46] is suddenly
turned off during a packet passing b/w TCP and IP.

part of PR kern/53562
OK msaitoh
 1.213 29-Nov-2018  ozaki-r Don't validate the source address of forwarding IPv6 packets (same as IPv4)
 1.212 10-Aug-2018  maxv Rename

ip6_undefer_csum -> in6_undefer_cksum
in6_delayed_cksum -> in6_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in6_offload.c. Add comments to explain what
we're doing.

Same as IPv4.
 1.211 01-Jun-2018  maxv branches: 1.211.2;
Rename

M_CSUM_DATA_IPv6_HL -> M_CSUM_DATA_IPv6_IPHL
M_CSUM_DATA_IPv6_HL_SET -> M_CSUM_DATA_IPv6_SET

Reduces the diff against IPv4. Also, clarify the definitions.
 1.210 29-May-2018  maxv Remove dead code, we don't care.
 1.209 09-May-2018  maxv Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
 1.208 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.207 29-Apr-2018  maxv Remove unused and misleading argument from ipsec_set_policy.
 1.206 26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.205 23-Apr-2018  maxv Remove the kernel RH0 code. RH0 is deprecated by RFC5095, for security
reasons. RH0 was already removed in the kernel's input path, but some
parts were still present in the output path: they are now removed.

Sent on tech-net@ a few days ago.
 1.204 18-Apr-2018  maxv Remove unused netipsec/xform.h includes.
 1.203 27-Feb-2018  maxv branches: 1.203.2;
Dedup: merge ipsec4_set_policy and ipsec6_set_policy. The content of the
original ipsec_set_policy function is inlined into the new one.
 1.202 27-Feb-2018  maxv Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.201 12-Feb-2018  maxv Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.
 1.200 31-Jan-2018  maxv Correct the check; we want to find IPPROTO_HOPOPTS, not IPV6_HOPOPTS. This
just couldn't work.

By the way, I'm wondering what is the point of this block. Calling
ip6_hopopts_input() won't achieve anything useful, and it could actually
be a problem, because there are several paths in it that call icmp6_error,
which calls ip6_output, and then we're back in the same function. Besides
it is possible to reach icmp6_error with a packet we emitted (as opposed
to a packet we are forwarding), and in that case we are sending an ICMP
error back to ourselves.
 1.199 31-Jan-2018  maxv Remove a misleading instruction. We don't care about increasing
m_pkthdr.len in ip6_insertfraghdr(), it gets recomputed after calling
this function.

If we cared there would be a bug, since we don't increase it in the
other branches.
 1.198 31-Jan-2018  maxv Try to sound a little less pessimistic, there is nothing wrong here.
 1.197 31-Jan-2018  maxv Style, localify, constify, and reorder a bit. No real functional change.
 1.196 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.195 25-Nov-2017  kre Attempt to restore v6 networking. Not 100% certain that these
changes are all that is needed, but they're certainly a big part of it
(especially the ip6_input.c change.)
 1.194 24-Nov-2017  roy Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
 1.193 02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.192 26-Jun-2017  ozaki-r Fix usage of ip6_get_membership

It may set nothing to ifp even if returning 0. So we need to NULL-clear
ifp before calling it.

Fix PR kern/52324
 1.191 03-Mar-2017  ozaki-r branches: 1.191.6;
Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.190 02-Mar-2017  ozaki-r Make sure im6o_memberships is protected by in6p's lock (solock)
 1.189 02-Mar-2017  ozaki-r Make usages of ifp MP-safe in some functions of IP multicast
 1.188 02-Mar-2017  ozaki-r Use LIST_* macros

No functional change.
 1.187 01-Mar-2017  ozaki-r Provide in6_multi_group

Use it when checking if we belong to the group, instead of in6_lookup_multi.

No functional change.
 1.186 22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.185 22-Feb-2017  ozaki-r Add assertions and comments for lock states of socket and pcb
 1.184 17-Feb-2017  ozaki-r Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.
 1.183 14-Feb-2017  ozaki-r Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.182 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.181 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.180 11-Jan-2017  ozaki-r branches: 1.180.2;
Get rid of unnecessary header inclusions
 1.179 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.178 10-Nov-2016  ozaki-r Tidy up in6_select*

This change tidies up in6_select* functions, especially
selectroute.

selectroute is annoying because:
- It returns both/either of a rtentry and/or an ifp
- Yes, it may return only an ifp!
- It is valid but selectroute shouldn't handle the case
- Such conditional behavior makes it difficult
to apply locking/psref thingy
- It may return a rtentry even if error
- It may use opt->ip6po_nextroute rtcache implicitly
- The caller can know if it is used
by rtcache_validate(&opt->ip6po_nextroute)
but it's racy in MP-safe world
- Even if it uses opt->ip6po_nextroute, it may
return a rtentry that isn't derived from the rtcache

The change includes:
- Rename selectroute to in6_selectroute
- Let a remaining caller of selectroute, in6_selectif,
use in6_selectroute instead
- Let in6_selectroute return only an rtentry
- If error, it doesn't return an rtentry
- A caller gets an ifp from a returned rtentry
- Allow in6_selectroute to modify a passed rtcache
and a caller can know if opt->ip6po_nextroute is
used via the rtcache
- Let callers (ip6_output and in6_selectif) handle
the case that only an ifp is required

Inspired by OpenBSD
Proposed on tech-kern and tech-net
LGTM by roy@
 1.177 07-Nov-2016  ozaki-r Pull routing header handling out of ip6_output

No functional change.
 1.176 07-Nov-2016  ozaki-r Tidy up ip6_getpmtu

Pull rtcache thing out of ip6_getpmtu; that isn't an essential
of the function. Add comments inspired by FreeBSD.

No functional change.
 1.175 20-Sep-2016  roy Drop UDP packets as well as TCP without error when sending from detached or
tentative addresses.
 1.174 15-Sep-2016  roy Ensure that packets are sent from a valid address.
If the packet is TCP and the address is detached or tentative then
it's just dropped, otherwise an error is returned.

This is needed because you can bind to a valid address and it can then
become invalid.

This satisfies RFC 4862 section 5.5.4.
 1.173 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.172 29-Jul-2016  ozaki-r Avoid memset and rtcache_free if unnecessary

It's the same as ip_output.
 1.171 27-Jun-2016  christos branches: 1.171.2;
CID 1362905: Initialize ifp early, so that we don't if_put garbage in the
IPSEC case.
 1.170 21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.169 21-Jun-2016  ozaki-r Protect if_byindex with pserialize
 1.168 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.167 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.166 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.165 27-Apr-2015  ozaki-r Add missing error checks on rtcache_setdst

It can fail with ENOMEM.
 1.164 24-Apr-2015  ozaki-r Avoid NULL checks for a variable that is definitely NULL
 1.163 02-Feb-2015  christos CID/1267860: Missing break in switch
 1.162 20-Jan-2015  roy Fix IPV6_USE_MIN_MTU set by setsockopt(2) being ignored when
IPV6_PKTINFO is set as a control with sendmsg(2).
 1.161 20-Jan-2015  roy Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.160 12-Oct-2014  christos branches: 1.160.2;
Refactor the multicast membership code so that we can handle v4 mapped
addresses using the v6 membership ioctls.
 1.159 11-Oct-2014  christos Make IPV4 mapped addresses able to do IPV4 multicast. Fixes needed:

- allow binding to mapped v4 multicast addresses
- define v4moptions, allow setting it via ioctl, pass it to ip_output,
free it when killing the pcb.

Ideally we would allow the IPV6 multicast setsockopts work on mapped addresses
too, but this is a lot more work and linux does not do it either.
 1.158 16-Aug-2014  maxv http://m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-2

#03-0x02: Memory leak

ok ozaki-r@
 1.157 30-May-2014  christos branches: 1.157.2;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.156 17-May-2014  rmind Replace open-coded access (and boundary checking) of ifindex2ifnet with
if_byindex() function.
 1.155 03-Oct-2013  christos branches: 1.155.2;
check sockopt_get() error, from logan.
 1.154 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.153 05-Jun-2013  christos branches: 1.153.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.152 18-Mar-2013  gdt Initialize variable used as (conditional) result parameter.

ip6_insertfraghdr either sets a result parameter or returns an error.
While the caller only uses the result parameter in the non-error case,
knowing that requires cross-module static analysis, and that's not
robust against distant code changes. Therfore, set ip6f to NULL
before the function call that maybe sets it, avoiding a spuruious
warning and changing the future possible bug from an unitialized
dereference to a NULL deferrence.
 1.151 25-Jan-2013  kefren don't return hlim when asked for multicast loop flag
 1.150 21-Jul-2012  gdt branches: 1.150.2;
Add comments describing parameter handling for ip6_insertfraghdr.

Depending on compiler options, this code can be involved in an
(apparently) spurious compiler warning. However, it was not
immediately obvious the the compiler was wrong.
 1.149 25-Jun-2012  christos rename rfc6056 -> portalgo, requested by yamt
 1.148 22-Jun-2012  christos PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.147 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.146 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.145 05-Feb-2012  rmind branches: 1.145.2; 1.145.6; 1.145.8;
ip6_output: check for rtcache_setdst() error, which may happen if running
out of memory.
 1.144 10-Jan-2012  drochner remove conditionals which can't succeed, and also shouldn't because
one would get a kernel NULL dereference immediately
 1.143 10-Jan-2012  drochner add patch from Arnaud Degroote to handle IPv6 extended options with
(FAST_)IPSEC, tested lightly with a DSTOPTS header consisting
of PAD1
 1.142 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.141 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.140 25-Apr-2011  yamt branches: 1.140.4; 1.140.8;
undefer csum in looutput.
looutput is used by various code (ether_output, mcast) to loopback packets.
 1.139 07-May-2009  elad branches: 1.139.4; 1.139.6;
Remove some more "priv" variable usage in favor of kauth(9) calls.
 1.138 06-May-2009  elad Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.137 18-Apr-2009  drochner fix traversing of a control mbuf in the case that a message len
is not aligned wrt CMSG_ALIGN - the length counter drops below 0
in this case which was not checked for,
fixes crashes (with isc_dhcrelay4) reported by Uwe in tech-net
(subject: netbsd5-rc3 crash caused by isc_dhcrelay)
 1.136 18-Mar-2009  cegger bzero -> memset
 1.135 27-Oct-2008  plunky branches: 1.135.2; 1.135.6;
sockopt_getmbuf() may fail, handle that possibility
 1.134 12-Oct-2008  plunky branches: 1.134.2;
ip6_pcbopts() is called with the socket lock held, use M_NOWAIT
 1.133 12-Oct-2008  plunky ip6_pcbopt() is in the ctloutput path, we should not
sleep here because socket lock is held. use M_NOWAIT
 1.132 12-Oct-2008  plunky convert ip6_[sg]etmoptions() to use sockopt(9) API
should be no functional change
 1.131 12-Oct-2008  plunky do not sleep while allocating memory, socket lock is held
(use ENOBUFS for failure)
 1.130 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.129 23-Apr-2008  thorpej branches: 1.129.2; 1.129.4; 1.129.8;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.128 15-Apr-2008  thorpej branches: 1.128.2;
Make ip6 and icmp6 stats per-cpu.
 1.127 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.126 14-Jan-2008  dyoung branches: 1.126.2; 1.126.6;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in6_losing().
 1.125 10-Jan-2008  dyoung Save some rtcache_getrt() calls.
 1.124 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.123 06-Nov-2007  dyoung branches: 1.123.2; 1.123.6;
Use sockaddr_in6_init().
 1.122 01-Nov-2007  dyoung branches: 1.122.2;
De-__P().
 1.121 19-Sep-2007  dyoung branches: 1.121.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.120 02-Jun-2007  alc branches: 1.120.6; 1.120.8;
don't increment `ip6stat.ip6s_noroute' here, it has already been done in
in6_src:in6_selectroute().

ok dyoung@
 1.119 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.118 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.117 04-Mar-2007  christos branches: 1.117.2; 1.117.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.116 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.115 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.114 10-Feb-2007  degroote branches: 1.114.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.113 29-Jan-2007  dyoung Cosmetic: bzero -> memset, remove gratuitous cast, compare pointer
with NULL instead of 0.
 1.112 29-Jan-2007  dyoung In In ip6_setmoptions(), don't leave a route cache (struct route_in6)
on the stack if we exit with EADDRNOTAVAIL.

(I already fixed this bug once tonight. Clearly, ip6_setmoptions
was cut-and-pasted from ip_setmoptions.)
 1.111 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.110 27-Dec-2006  alc CID-3317: check for 'm != NULL' before using it (rework the code path to
explicitly return `EINVAL'. Before, it was done but later in
ip6_setpktopt() when checking for 'len < ...')
CID-3316: check for 'm != NULL' before using it

ok christos@
 1.109 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.108 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.107 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.106 25-Nov-2006  yamt branches: 1.106.2; 1.106.4;
move tso-by-software code to their own files. no functional changes.
 1.105 23-Nov-2006  yamt implement ipv6 TSO.
partly from Matthias Scheler. tested by him.
 1.104 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.103 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.102 30-Aug-2006  christos branches: 1.102.2; 1.102.4;
remove impossible comparisons.
 1.101 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.100 12-Jul-2006  tron Add diagnostic checks for hardware-assisted checksum related flags in
the mbuf which supposed to get sent out:
- Complain in ip_output() if any of the IPv6 related flags are set.
- Complain in ip6_output() if any of the IPv4 related flags are set.
- Complain in both functions if the flags indicate that both a TCP and
UCP checksum should be calculated by the hardware.
 1.99 08-Jul-2006  rpaulo Add a missing piece from RFC 3542. KAME-NetBSD-current branch
revision 1.1.1.2.2.5:
do not call pfctlinput2(PRC_MSGSIZE) on fragmentation to avoid
notification storm

From Keiichi SHIMA:
"In the current NetBSD code, the PRC_MSGSIZE message will be generated
for every fragmented packets when a node is trying to send a big
packet. That was the intermediate behavior while RFC3542 was under
discussion."

By (obviously) the KAME project.
 1.98 14-May-2006  elad branches: 1.98.4;
integrate kauth.
 1.97 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.96 15-Apr-2006  christos Coverity CID 608: #ifdef out dead code.
 1.95 05-Mar-2006  rpaulo branches: 1.95.2; 1.95.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.94 21-Jan-2006  rpaulo branches: 1.94.2; 1.94.4; 1.94.6;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.93 11-Dec-2005  christos branches: 1.93.2;
merge ktrace-lwp.
 1.92 23-Sep-2005  christos change bcopy to memmove since this was supposed to be an ovbcopy (from kre)
 1.91 18-Aug-2005  yamt - introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.90 10-Aug-2005  yamt re-implement ipv6 tx loopback checksum omission.
 1.89 10-Aug-2005  yamt ipv6 tx checksum offloading. reviewed by Jason Thorpe.
 1.88 28-Feb-2005  itojun branches: 1.88.4;
make ip6_getpmtu back to static
 1.87 21-Dec-2004  drochner branches: 1.87.2; 1.87.4;
fix ifindex argument checks for IPV6_JOIN_GROUP,
IPV6_LEAVE_GROUP and IPV6_MULTICAST_IF -
0 is always legal
 1.86 04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.85 14-Jul-2004  itojun - update ro_pmtu on IPsec tunnel encapsulation. ro != ro_pmtu is used as the
sign for the existence of routing header.
- fragment to 1280 on IPv6-over-IPv6 encapsulation, as ICMPv6 too big may not
give you enough information to update pmtu cache.

from iij seil team, via kame.
 1.84 06-Jul-2004  minoura Remove broken code for now: getsockopt(s, IPPROTO_IP, IP_IPSEC_POLICY,...).
It returned EINVAL, now returns ENOPROTOOPT.
Ok'd by itojun.
 1.83 11-Jun-2004  itojun implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.82 23-Mar-2004  martti branches: 1.82.2;
Make ip6_getpmtu() globally visible. This is needed by IPFilter 4.x.
 1.81 02-Mar-2004  thorpej Use the new IPSEC_PCB_SKIP_IPSEC() to bypass a socket policy lookup
when possible. This shaves several cycles from the output path for
non-IPsec connections, even if the policy is cached in the PCB.
 1.80 01-Mar-2004  itojun knf
 1.79 06-Feb-2004  itojun remove unneeded #ifdef
 1.78 04-Feb-2004  itojun strictly follow RFC2460 section 5 last paragraph
(sending rule when PMTU < 1280). pointed out by guninski at guninski.com
 1.77 24-Jan-2004  darrenr make ip6_getpmtu() externally visible
 1.76 19-Jan-2004  itojun do not lookup security policy if IPV6_FORWARDING.
avoids possible infinite ipsec encapsulation on
ip6_input -> ip6_forward -(tunnel mode)-> ip6_output
case. from kame
 1.75 10-Dec-2003  itojun fix cases where pktinfo specifies outgoing interface of "0".
 1.74 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.73 06-Nov-2003  itojun correct behavior when ipv6mr_interface is 0. Matthias Drochner
 1.72 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.71 03-Oct-2003  itojun when dropping M_PKTHDR, need to free m_tag associated with it.
 1.70 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.69 05-Sep-2003  itojun u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.68 04-Sep-2003  itojun don't use m_cat to mbuf of different types. KAME-PR-495
 1.67 25-Aug-2003  itojun don't commit value into ip6_ptkopts until the validation is done.
(note: the code will be updated with 2292bis definition soon, hopefully)
 1.66 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.65 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.64 22-Aug-2003  jonathan Change KAME code for ip_output()/ip6_output() to obtain struct socket*
from the explicit inpcb*/in6pcb* argument. set_socket() becomes redundant.
 1.63 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.62 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.61 06-Jun-2003  itojun branches: 1.61.2;
- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).
 1.60 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.59 31-Oct-2002  itojun plug a memory leak. from sam leffler. sync w/kame
 1.58 23-Sep-2002  itojun length field on PADN option, before jumbo payload option was wrong.
sync w/kame
 1.57 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.56 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.55 09-Jun-2002  itojun whitespace cleanup
 1.54 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.53 07-Jun-2002  itojun sync IPV6_CHECKSUM handling with kame.
 1.52 07-Jun-2002  itojun comment
 1.51 07-Jun-2002  itojun whitespace
 1.50 07-Jun-2002  itojun remove #if 0'ed portion
 1.49 07-Jun-2002  itojun KNF a bit
 1.48 07-Jun-2002  itojun typo
 1.47 07-Jun-2002  itojun 'fall through' is not a valid LINT keyword.
 1.46 31-May-2002  itojun do not try to update rmx_mtu if rmx_mtu == 0 (obey ifmtu)
 1.45 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.44 28-Mar-2002  itojun branches: 1.44.2; 1.44.4;
make sure to check address family in route cache
(I really hate IPv4 mapped address...)
 1.43 20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.42 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.41 13-Nov-2001  lukem add RCSIDs
 1.40 24-Oct-2001  itojun more whitespace sync with kame
 1.39 18-Oct-2001  itojun branches: 1.39.2;
reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.38 17-Oct-2001  itojun unifdef OLDIP6OUTPUT
 1.37 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.36 11-Jun-2001  itojun branches: 1.36.2;
remove IPV6FIREWALL case, which is never used
 1.35 11-Apr-2001  itojun disallow userland programs from specifying addresses with IPV6_PKTINFO
setsockopt, if:
- the address is not verified by DAD (= not ready)
- the address is an anycast address (= not permitted as source)
sync with kame
 1.34 30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.33 25-Mar-2001  itojun re-initialize mopt in ip6_insert_jumboopt(). sync with kame
From: csapuntz@stanford.edu
 1.32 21-Mar-2001  itojun set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame
 1.31 10-Feb-2001  itojun branches: 1.31.2;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.30 06-Feb-2001  itojun bad semicolon after "if" conditional. sync with kame
 1.29 02-Feb-2001  itojun avoid panic when a packet with nonexistent link-local address is issued.
kame 1.151 -> 1.152.
 1.28 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.27 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.26 23-Oct-2000  itojun make IFA_STATS really work on IPv6.
 1.25 19-Aug-2000  itojun - icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)
 1.24 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.23 20-Jun-2000  itojun branches: 1.23.2;
avoid possible mbuf leaks on ipsec policy violation.(sync with kame)
 1.22 03-Jun-2000  itojun sync with kame.
- use latest source address selection code - in6_src.c.
- correct frag header insertion.
- deep copy ip6 header portion in ip6_mloopback to avoid overwrite.
- do not bark when we forward packet to loopback.
- some cosmetics.
 1.21 19-May-2000  itojun branches: 1.21.2;
correct manipulation of link-local scoped address on loopback.
now "telnet fe80::1%lo0" should work again.
(we have another bug near here - will attack it soon)
 1.20 19-May-2000  thorpej NULL != 0
 1.19 19-May-2000  itojun do not mistakingly forward link-local scoped packet (the bug was added
with "beyondscope" icmp6 support).
"options FAKE_LOOPBACK_IF" will honor scope on loopback outputs. rcvif will
be real interface, not the loopback, just like when multicast loopback.

(sync with kame)
 1.18 29-Mar-2000  simonb Remove duplicate declaration of ifindex2ifnet - it's in <net/if.h>.
 1.17 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.16 20-Feb-2000  darrenr pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".
 1.15 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.14 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.13 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.12 26-Jan-2000  itojun make setsockopt(IPV6_PORTRANGE) work. obeys IPNOPRIVPORTS.
 1.11 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.10 06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.9 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.8 31-Jul-1999  itojun branches: 1.8.2; 1.8.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.7 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.6 22-Jul-1999  itojun - implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.
 1.5 22-Jul-1999  itojun change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.4 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6_output.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6_output.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.8.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.8.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.8.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.8.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23.2.7 23-Sep-2002  itojun Correct length field on PADN option, before jumbo payload option.
 1.23.2.6 26-Feb-2002  he Apply patch (requested by martti):
Fix it so that IPFilter handles IPv6 traffic.
 1.23.2.5 22-Apr-2001  he Pull up revision 1.35 (requested by itojun):
Disallow addresses that are not supposed to be put into IPv6
source, on IPV6_PKTINFO setsockopt.
 1.23.2.4 06-Apr-2001  he Pull up revision 1.28 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.23.2.3 26-Feb-2001  he Pull up revision 1.30 (requested by itojun):
Remove a misplaced semicolon after ``if'' conditional.
 1.23.2.2 04-Feb-2001  he Pull up revision 1.29 (via patch, requested by itojun):
Avoid panic when a packet with nonexistent link-local address is
issued.
 1.23.2.1 20-Jun-2000  he file ip6_output.c was added on branch netbsd-1-5 on 2001-02-04 19:22:08 +0000
 1.31.2.14 11-Nov-2002  nathanw Catch up to -current
 1.31.2.13 18-Oct-2002  nathanw Catch up to -current.
 1.31.2.12 17-Sep-2002  nathanw Catch up to -current.
 1.31.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.31.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.31.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.31.2.8 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.31.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.31.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.31.2.5 22-Oct-2001  nathanw Catch up to -current.
 1.31.2.4 21-Jun-2001  nathanw Catch up to -current.
 1.31.2.3 09-Apr-2001  nathanw Catch up with -current.
 1.31.2.2 13-Mar-2001  nathanw Be more careful not to dereference curproc when there might not be
a process context.
 1.31.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.36.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.36.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.36.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.39.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.44.4.5 14-Jun-2004  jmc Pullup rev 1.83 (requested by itojun in ticket #1709)

Implement IPV6_USE_MIN_MTU sockopt.
 1.44.4.4 07-Feb-2004  jmc Pullup rev 1.78 (requested by itojun in ticket #1605)

Strictly follow RFC2460 section 5 last paragraph (sending rule when
PMTU < 1280).
 1.44.4.3 02-Oct-2003  tron Pull up revision 1.54 via patch (requested by itojun in ticket #1491):
sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.44.4.2 13-Oct-2002  lukem Pull up revision 1.58 (requested by itojun in ticket #855):
length field on PADN option, before jumbo payload option was wrong.
sync w/kame
 1.44.4.1 05-Jun-2002  lukem Pull up revision 1.46 (via patch) (requested by itojun in ticket #123):
do not try to update rmx_mtu if rmx_mtu == 0 (obey ifmtu)
 1.44.2.2 20-Jun-2002  gehenna catch up with -current.
 1.44.2.1 30-May-2002  gehenna Catch up with -current.
 1.61.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.61.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.61.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.61.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.61.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.61.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.61.2.1 03-Aug-2004  skrll Sync with HEAD
 1.82.2.1 14-Jun-2004  tron Pull up revision 1.83 (requested by itojun in ticket #468):
implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.87.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.87.2.1 29-Apr-2005  kent sync with -current
 1.88.4.7 21-Jan-2008  yamt sync with head
 1.88.4.6 15-Nov-2007  yamt sync with head.
 1.88.4.5 27-Oct-2007  yamt sync with head.
 1.88.4.4 03-Sep-2007  yamt sync with head.
 1.88.4.3 26-Feb-2007  yamt sync with head.
 1.88.4.2 30-Dec-2006  yamt sync with head.
 1.88.4.1 21-Jun-2006  yamt sync with head.
 1.93.2.1 01-Feb-2006  yamt sync with head.
 1.94.6.4 03-Sep-2006  yamt sync with head.
 1.94.6.3 11-Aug-2006  yamt sync with head
 1.94.6.2 24-May-2006  yamt sync with head.
 1.94.6.1 13-Mar-2006  yamt sync with head.
 1.94.4.2 01-Jun-2006  kardel Sync with head.
 1.94.4.1 22-Apr-2006  simonb Sync with head.
 1.94.2.5 09-Sep-2006  rpaulo sync with head
 1.94.2.4 23-Feb-2006  rpaulo ip6_raw_ctloutput(): s/in6pcb/inpcb
ip6_optlen(): convert to inpcb
 1.94.2.3 14-Feb-2006  rpaulo in6pcb -> inpcb.
 1.94.2.2 07-Feb-2006  rpaulo sotoinpcb_hdr -> sotoinpcb.
 1.94.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.95.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.95.2.6 12-May-2006  elad adapt to kauth kpi, include sys/kauth.h where needed..
 1.95.2.5 11-May-2006  elad sync with head
 1.95.2.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.95.2.3 19-Apr-2006  elad sync with head.
 1.95.2.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.95.2.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.98.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.102.4.3 18-Dec-2006  yamt sync with head.
 1.102.4.2 10-Dec-2006  yamt sync with head.
 1.102.4.1 22-Oct-2006  yamt sync with head
 1.102.2.3 01-Feb-2007  ad Sync with head.
 1.102.2.2 12-Jan-2007  ad Sync with head.
 1.102.2.1 18-Nov-2006  ad Sync with head.
 1.106.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.106.2.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.114.2.3 07-May-2007  yamt sync with head.
 1.114.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.114.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.117.4.1 11-Jul-2007  mjf Sync with head.
 1.117.2.3 09-Oct-2007  ad Sync with head.
 1.117.2.2 09-Jun-2007  ad Sync with head.
 1.117.2.1 08-Jun-2007  ad Sync with head.
 1.120.8.4 23-Mar-2008  matt sync with HEAD
 1.120.8.3 09-Jan-2008  matt sync with HEAD
 1.120.8.2 08-Nov-2007  matt sync with -HEAD
 1.120.8.1 06-Nov-2007  matt sync with HEAD
 1.120.6.3 11-Nov-2007  joerg Sync with HEAD.
 1.120.6.2 04-Nov-2007  jmcneill Sync with HEAD.
 1.120.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.121.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.122.2.3 18-Feb-2008  mjf Sync with HEAD.
 1.122.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.122.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.123.6.3 19-Jan-2008  bouyer Sync with HEAD
 1.123.6.2 10-Jan-2008  bouyer Sync with HEAD
 1.123.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.123.2.1 26-Dec-2007  ad Sync with head.
 1.126.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.126.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.126.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.126.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.128.2.1 18-May-2008  yamt sync with head.
 1.129.8.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.129.8.1 19-Oct-2008  haad Sync with HEAD.
 1.129.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.129.2.2 16-May-2009  yamt sync with head
 1.129.2.1 04-May-2009  yamt sync with head.
 1.134.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.134.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.135.6.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.135.2.2 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1920):
sys/netinet6/ip6_output.c 1.158 via patch

Fix a memory leak in calling setsockopt() on an INET6 socket.
 1.135.2.1 20-Apr-2009  snj branches: 1.135.2.1.6; 1.135.2.1.10;
Pull up following revision(s) (requested by drochner in ticket #713):
sys/netinet6/ip6_output.c: revision 1.137
fix traversing of a control mbuf in the case that a message len
is not aligned wrt CMSG_ALIGN - the length counter drops below 0
in this case which was not checked for,
fixes crashes (with isc_dhcrelay4) reported by Uwe in tech-net
(subject: netbsd5-rc3 crash caused by isc_dhcrelay)
 1.135.2.1.10.1 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1920):
sys/netinet6/ip6_output.c 1.158 via patch

Fix a memory leak in calling setsockopt() on an INET6 socket.
 1.135.2.1.6.1 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1920):
sys/netinet6/ip6_output.c 1.158 via patch

Fix a memory leak in calling setsockopt() on an INET6 socket.
 1.139.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.139.4.1 31-May-2011  rmind sync with head
 1.140.8.2 05-Apr-2012  mrg sync to latest -current.
 1.140.8.1 18-Feb-2012  mrg merge to -current.
 1.140.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.140.4.2 30-Oct-2012  yamt sync with head
 1.140.4.1 17-Apr-2012  yamt sync with head
 1.145.8.1 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1114):
sys/netinet6/ip6_output.c 1.158 via patch

Fix a memory leak in calling setsockopt() on an INET6 socket.
 1.145.6.1 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1114):
sys/netinet6/ip6_output.c 1.158 via patch

Fix a memory leak in calling setsockopt() on an INET6 socket.
 1.145.2.1 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1114):
sys/netinet6/ip6_output.c 1.158 via patch

Fix a memory leak in calling setsockopt() on an INET6 socket.
 1.150.2.4 03-Dec-2017  jdolecek update from HEAD
 1.150.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.150.2.2 23-Jun-2013  tls resync from head
 1.150.2.1 25-Feb-2013  tls resync with head
 1.153.2.3 18-May-2014  rmind sync with head
 1.153.2.2 28-Aug-2013  rmind sync with head
 1.153.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.155.2.1 10-Aug-2014  tls Rebase.
 1.157.2.3 14-Feb-2015  snj Pull up following revision(s) (requested by roy in ticket #509):
sys/netinet6/ip6_output.c: revision 1.163
CID/1267860: Missing break in switch
 1.157.2.2 23-Jan-2015  martin Pull up following revision(s) (requested by pettai in ticket #441):
sys/netinet6/ip6_var.h: revision 1.64
sys/netinet6/in6.h: revision 1.82
sys/netinet6/in6_src.c: revision 1.56
sys/netinet6/mld6.c: revision 1.62
sys/netinet6/ip6_input.c: revision 1.150
sys/netinet6/ip6_output.c: revision 1.161
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.
Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.157.2.1 24-Aug-2014  martin Pull up following revision(s) (requested by maxv in ticket #51):
sys/netinet6/ip6_output.c: revision 1.158
sys/rump/librump/rumpvfs/rumpfs.c: revision 1.130
Fix memory leaks in error cases
 1.160.2.8 28-Aug-2017  skrll Sync with HEAD
 1.160.2.7 05-Feb-2017  skrll Sync with HEAD
 1.160.2.6 05-Dec-2016  skrll Sync with HEAD
 1.160.2.5 05-Oct-2016  skrll Sync with HEAD
 1.160.2.4 09-Jul-2016  skrll Sync with HEAD
 1.160.2.3 22-Sep-2015  skrll Sync with HEAD
 1.160.2.2 06-Jun-2015  skrll Sync with HEAD
 1.160.2.1 06-Apr-2015  skrll Sync with HEAD
 1.171.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.171.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.171.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.171.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.180.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.191.6.6 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1884):

sys/netinet6/in6.c: revision 1.289
sys/netinet6/ip6_output.c: revision 1.234

in6: clear ND6_IFF_IFDISABLED to allow DAD again on link-up

in6: don't send any IPv6 packets over a disabled interface
 1.191.6.5 23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1808):

sys/netinet6/raw_ip6.c: revision 1.183 (via patch)
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.191.6.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.191.6.3 10-Dec-2017  snj Pull up following revision(s) (requested by roy in ticket #390):
sys/netinet/ip_input.c: 1.363
sys/netinet6/ip6_input.c: 1.184-1.185
sys/netinet6/ip6_output.c: 1.194-1.195
sys/netinet6/in6_src.c: 1.83-1.84
Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
--
Attempt to restore v6 networking. Not 100% certain that these
changes are all that is needed, but they're certainly a big part of it
(especially the ip6_input.c change.)
--
Treat unvalidated addresses as deprecated in rule 3.
 1.191.6.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.191.6.1 01-Jul-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #73):
sys/netinet6/ip6_output.c: revision 1.192
Fix usage of ip6_get_membership
It may set nothing to ifp even if returning 0. So we need to NULL-clear
ifp before calling it.
Fix PR kern/52324
 1.203.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.203.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.203.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.203.2.3 21-May-2018  pgoyette Sync with HEAD
 1.203.2.2 02-May-2018  pgoyette Synch with HEAD
 1.203.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.211.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.211.2.1 10-Jun-2019  christos Sync with HEAD
 1.220.2.2 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1707):

sys/netinet6/in6.c: revision 1.289
sys/netinet6/ip6_output.c: revision 1.234

in6: clear ND6_IFF_IFDISABLED to allow DAD again on link-up

in6: don't send any IPv6 packets over a disabled interface
 1.220.2.1 23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1615):

sys/netinet6/raw_ip6.c: revision 1.183 (via patch)
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.226.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.231.2.4 11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #826):

sys/netinet6/ip6_output.c: revision 1.235

ip6_output: Initialize plen for ip6_hopopts_input.

This funny little block in ip6_process_hopopts assumes it is
initialized as and behaves differently depending on whether it's zero
or not:
https://nxr.netbsd.org/xref/src/sys/netinet6/ip6_input.c?r=1.227#976

In the other call site, it is initialized to ip6->ip6_plen:
https://nxr.netbsd.org/xref/src/sys/netinet6/ip6_input.c?r=1.227#561
 1.231.2.3 20-Jul-2024  martin Pull up following revision(s) (requested by rin in ticket #740):

sys/netipsec/ipsec_input.c: revision 1.79
sys/netipsec/ipsec_output.c: revision 1.86
sys/netipsec/ipsec.c: revision 1.178
sys/netinet6/ip6_output.c: revision 1.232

ipsec: remove unnecessary splsoftnet

Because the code of IPsec itself is already MP-safe.
 1.231.2.2 04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #310):

sys/netinet6/in6.c: revision 1.289
sys/netinet6/ip6_output.c: revision 1.234

in6: clear ND6_IFF_IFDISABLED to allow DAD again on link-up

in6: don't send any IPv6 packets over a disabled interface
 1.231.2.1 23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #125):

sys/netinet6/raw_ip6.c: revision 1.183
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.6 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.5 17-Feb-2021  christos - pass the alignment instead of the mask (as Roy asked and to match the
other macro)
- use alignof to determine that alignment and CTASSERT what we expect
- remove unused macros
 1.4 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.3 28-Apr-2008  martin branches: 1.3.4; 1.3.102;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.1 15-Apr-2008  thorpej branches: 1.1.2;
Make ip6 and icmp6 stats per-cpu.
 1.1.2.1 18-May-2008  yamt sync with head.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.102.1 03-Apr-2021  thorpej Sync with HEAD.
 1.3.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.4.1 28-Apr-2008  mjf file ip6_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:27 +0000
 1.94 09-Feb-2024  andvar fix spelling mistakes, mainly in comments and log messages.
 1.93 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.92 24-Oct-2022  knakahara Fix PR kern/57037

Be able to change the behavior sending parameter changing routing messages.
When set net.inet6.ip6.param_rt_msg=0, don't send parameter changing
routing messages.
When set net.inet6.ip6.param_rt_msg=1(default), send parameter changing
routing messages by RTM_NEWADDR.
 1.91 17-Aug-2021  andvar fix multiplei repetitive typos in comments, messages and documentation. mainly because copy paste code big amount of files are affected.
 1.90 11-Mar-2021  ryo flowlabel will never return anything other than 1 or 0.
s/&&/&/
 1.89 08-Mar-2021  christos no need for ip6_id.c...
 1.88 07-Mar-2021  christos netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)
 1.87 28-Aug-2020  ozaki-r branches: 1.87.2;
inet6: reduce silent packet discards
 1.86 28-Aug-2020  ozaki-r inet6: pass rcvif to ip6_forward to avoid extra psref_acquire
 1.85 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.84 19-Jun-2020  maxv localify
 1.83 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.82 13-May-2019  ozaki-r branches: 1.82.2;
Count packets dropped by pfil
 1.81 29-Nov-2018  ozaki-r Introduce and use ip_dad_enabled() and ip6_dad_enabled() functions
 1.80 14-Feb-2018  maxv branches: 1.80.2; 1.80.4;
Re-make ip6_nexthdr global, it will be used in soon-to-be-added code...
 1.79 30-Jan-2018  maxv Style, localify, remove dead code, and fix typos. No functional change.
 1.78 30-Jan-2018  maxv Fix a buffer overflow in ip6_get_prevhdr. Doing

mtod(m, char *) + len

is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.

The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.

But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.

However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.

As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.

Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.

Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.

This place is still fragile.
 1.77 29-Jan-2018  maxv Start cleaning up ip6_input.c. Several pieces of code have evolved but
their neighboring comments were not updated. So update them, and remove
code that has been disabled for years (it has no use anyway).
 1.76 25-Jan-2018  maxv Several changes:

* Move the structure definitions into frag6.c, they should not be used
elsewhere.

* Rename ip6af_mff -> ip6af_more, and switch it to bool, easier to
understand.

* Remove IP6_REASS_MBUF, no point in keeping this.

* Remove ip6q_arrive and ip6q_nxtp, unused.

* Style.
 1.75 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.74 03-Mar-2017  ozaki-r branches: 1.74.6;
Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.73 02-Mar-2017  ozaki-r Make usages of ifp MP-safe in some functions of IP multicast
 1.72 14-Feb-2017  ozaki-r Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.71 08-Dec-2016  ozaki-r branches: 1.71.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.70 10-Nov-2016  ozaki-r Tidy up in6_select*

This change tidies up in6_select* functions, especially
selectroute.

selectroute is annoying because:
- It returns both/either of a rtentry and/or an ifp
- Yes, it may return only an ifp!
- It is valid but selectroute shouldn't handle the case
- Such conditional behavior makes it difficult
to apply locking/psref thingy
- It may return a rtentry even if error
- It may use opt->ip6po_nextroute rtcache implicitly
- The caller can know if it is used
by rtcache_validate(&opt->ip6po_nextroute)
but it's racy in MP-safe world
- Even if it uses opt->ip6po_nextroute, it may
return a rtentry that isn't derived from the rtcache

The change includes:
- Rename selectroute to in6_selectroute
- Let a remaining caller of selectroute, in6_selectif,
use in6_selectroute instead
- Let in6_selectroute return only an rtentry
- If error, it doesn't return an rtentry
- A caller gets an ifp from a returned rtentry
- Allow in6_selectroute to modify a passed rtcache
and a caller can know if opt->ip6po_nextroute is
used via the rtcache
- Let callers (ip6_output and in6_selectif) handle
the case that only an ifp is required

Inspired by OpenBSD
Proposed on tech-kern and tech-net
LGTM by roy@
 1.69 31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.68 23-Aug-2016  knakahara improve fast-forward performance when the number of flows exceeds ip6_maxflows.

This is porting of ip_flow.c:r1.76

In ip6flow case, the before degradation is about 45%, the after degradation is
bout 55%.
 1.67 21-Jun-2016  ozaki-r branches: 1.67.2;
Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.66 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.65 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.64 20-Jan-2015  roy Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.63 12-Oct-2014  christos branches: 1.63.2;
Refactor the multicast membership code so that we can handle v4 mapped
addresses using the v6 membership ioctls.
 1.62 05-Jun-2014  rmind branches: 1.62.2;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.61 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.60 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.59 23-Jun-2012  christos branches: 1.59.2; 1.59.4; 1.59.12;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.58 19-Jan-2012  liamjfoy branches: 1.58.2; 1.58.6; 1.58.8;
Remove ip6f_start from ip6f struct
 1.57 10-Jan-2012  drochner add patch from Arnaud Degroote to handle IPv6 extended options with
(FAST_)IPSEC, tested lightly with a DSTOPTS header consisting
of PAD1
 1.56 04-Nov-2011  zoltan branches: 1.56.4;
Change the IPv6 reassembly mechanism to use mutex(9).
Also add ip6_reass_packet() to be used by NPF.
 1.55 24-May-2011  spz branches: 1.55.4;
RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.54 03-May-2011  dyoung *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.53 06-May-2009  elad branches: 1.53.4; 1.53.6;
Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.52 23-Mar-2009  liamjfoy Init ip6flow pool dynamically instead of using a linkset.
 1.51 06-Aug-2008  plunky branches: 1.51.2; 1.51.8;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.50 24-Apr-2008  ad branches: 1.50.2; 1.50.4; 1.50.8;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.49 15-Apr-2008  thorpej branches: 1.49.2;
Make ip6 and icmp6 stats per-cpu.
 1.48 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.47 19-Mar-2008  dyoung No code ever sets struct ip6_pktopts member ip6po_m, so get rid of
it.
 1.46 29-Oct-2007  dyoung branches: 1.46.12; 1.46.16;
The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.
 1.45 19-Jul-2007  dyoung branches: 1.45.4; 1.45.6; 1.45.10; 1.45.12;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.44 17-May-2007  yamt branches: 1.44.2;
remove net.inet6.ip6.rht0 sysctl.
it's too dangerous compared to its benefit.

strongly requested by itojun@. ok'ed by core@.
 1.43 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.42 22-Apr-2007  christos fix typo.
 1.41 22-Apr-2007  christos Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).

Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.40 23-Mar-2007  liamjfoy Add a new sysctl net.inet6.ip6.hashsize to control the hash table size.

The sysctl handler will ensure this value is a power of 2

ok dyoung@
 1.39 07-Mar-2007  liamjfoy branches: 1.39.2; 1.39.4; 1.39.6;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.38 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.37 05-May-2006  rpaulo branches: 1.37.12; 1.37.14;
Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.36 05-Mar-2006  rpaulo branches: 1.36.2; 1.36.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.35 21-Jan-2006  rpaulo branches: 1.35.2; 1.35.4; 1.35.6;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.34 11-Dec-2005  christos branches: 1.34.2;
merge ktrace-lwp.
 1.33 18-Oct-2004  itojun branches: 1.33.10; 1.33.12; 1.33.20; 1.33.22;
ip6_flow_seq is no longer available.
 1.32 06-Sep-2003  itojun branches: 1.32.2; 1.32.4; 1.32.6;
randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.31 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.30 22-Aug-2003  jonathan (Accidentally-omitted change): update for ip6_output() to match commit below.

replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.29 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.28 07-Aug-2003  itojun make net.inet6.ip6.redirect actually work. from Tomoyuki Sahara via kame
 1.27 08-Jul-2003  itojun prototype must not have variable name
 1.26 29-Jun-2003  fvdl branches: 1.26.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.25 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.24 28-Jan-2003  wiz success, not sucess. Noted by mjl.
 1.23 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.22 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.21 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.20 07-Jun-2002  itojun sync IPV6_CHECKSUM handling with kame.
 1.19 28-May-2002  itojun limit number of IPv6 fragments (not the fragment queue size) to
fight against lots-of-frags DoS attacks. sync w/kame
 1.18 21-Dec-2001  itojun branches: 1.18.8; 1.18.10;
move in6_gif_hlim decl to in6_gif.c. sync with kame
 1.17 20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.16 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.15 26-Aug-2000  itojun branches: 1.15.2; 1.15.4;
implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
 1.14 13-Jul-2000  itojun remove m_pulldown statistics code. it is highly experimental and belong
to kame tree only (not for *bsd).
 1.13 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.12 21-Mar-2000  itojun branches: 1.12.4;
cleanup AH/policy processing.
- parse IPv6 header by using common function, ip6_{last,next}hdr.
- fix behaivior in multiple AH cases.
make strict boundary checks on mbuf chasing.
(sync with latest kame)
 1.11 26-Feb-2000  itojun implement rip6_ctlinput, to cope with routing changes correctly.
(IMHO we need rip_ctlinput as well)
 1.10 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.9 03-Feb-2000  itojun - Don't reuse ip6 header portion as reassembly pointer, to be friendly
with LP64 arch. (not tested on LP64, sorry)
- add comment on reass rule
- some other cleanups

NetBSD PR: 9340
From: iwamoto@sat.t.u-tokyo.ac.jp
(in sync with kame)
 1.8 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.7 06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.4 22-Jul-1999  itojun branches: 1.4.2; 1.4.8;
change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6_var.h was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6_var.h was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.2 27-Aug-2000  itojun pullup (approved by releng-1-5)

> implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.

> cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3
> cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8
> cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h
> cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c
> cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c
> cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c
> cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
 1.12.4.1 14-Jul-2000  itojun pullup (approved by releng-1-5)

remove m_pulldown statistics code. it is highly experimental and belong
to kame tree only (not for *bsd).

1.4 -> 1.5 syssrc/sys/kern/uipc_mbuf2.c
1.8 -> 1.9 syssrc/sys/netinet/ip6.h
1.13 -> 1.14 syssrc/sys/netinet6/ip6_var.h
 1.15.4.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.15.4.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.15.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.15.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.2.5 17-Sep-2002  nathanw Catch up to -current.
 1.15.2.4 01-Aug-2002  nathanw Catch up to -current.
 1.15.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.15.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.15.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.18.10.1 02-Oct-2003  tron Pull up revision 1.21 via patch (requested by itojun in ticket #1491):
sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.18.8.3 15-Jul-2002  gehenna catch up with -current.
 1.18.8.2 20-Jun-2002  gehenna catch up with -current.
 1.18.8.1 30-May-2002  gehenna Catch up with -current.
 1.26.2.5 19-Oct-2004  skrll Sync with HEAD
 1.26.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.26.2.2 03-Aug-2004  skrll Sync with HEAD
 1.26.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.32.6.1 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.32.4.1 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.32.2.1 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.33.22.1 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.33.20.1 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.33.12.5 24-Mar-2008  yamt sync with head.
 1.33.12.4 15-Nov-2007  yamt sync with head.
 1.33.12.3 03-Sep-2007  yamt sync with head.
 1.33.12.2 26-Feb-2007  yamt sync with head.
 1.33.12.1 21-Jun-2006  yamt sync with head.
 1.33.10.1 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.34.2.1 01-Feb-2006  yamt sync with head.
 1.35.6.2 24-May-2006  yamt sync with head.
 1.35.6.1 13-Mar-2006  yamt sync with head.
 1.35.4.2 01-Jun-2006  kardel Sync with head.
 1.35.4.1 22-Apr-2006  simonb Sync with head.
 1.35.2.2 09-Sep-2006  rpaulo sync with head
 1.35.2.1 07-Feb-2006  rpaulo in6pcb -> inpcb.
 1.36.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.36.2.1 11-May-2006  elad sync with head
 1.37.14.5 17-May-2007  yamt sync with head.
 1.37.14.4 07-May-2007  yamt sync with head.
 1.37.14.3 24-Mar-2007  yamt sync with head.
 1.37.14.2 12-Mar-2007  rmind Sync with HEAD.
 1.37.14.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.37.12.1 28-Apr-2007  bouyer Pull up following revision(s) (requested by christos in ticket #587):
sys/netinet6/ip6_input.c: revision 1.102
sys/netinet6/route6.c: revision 1.18
sys/netinet6/ip6_var.h: revision 1.41
sys/netinet6/ip6_var.h: revision 1.42
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.39.6.1 29-Mar-2007  reinoud Pullup to -current
 1.39.4.1 11-Jul-2007  mjf Sync with head.
 1.39.2.3 20-Aug-2007  ad Sync with HEAD.
 1.39.2.2 08-Jun-2007  ad Sync with head.
 1.39.2.1 10-Apr-2007  ad Sync with head.
 1.44.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.45.12.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.45.12.1 19-Jul-2007  dyoung file ip6_var.h was added on branch matt-mips64 on 2007-07-19 20:48:58 +0000
 1.45.10.1 13-Nov-2007  bouyer Sync with HEAD
 1.45.6.2 23-Mar-2008  matt sync with HEAD
 1.45.6.1 06-Nov-2007  matt sync with HEAD
 1.45.4.1 31-Oct-2007  joerg Sync with HEAD.
 1.46.16.3 28-Sep-2008  mjf Sync with HEAD.
 1.46.16.2 02-Jun-2008  mjf Sync with HEAD.
 1.46.16.1 03-Apr-2008  mjf Sync with HEAD.
 1.46.12.2 24-Mar-2008  keiichi sync with head.
 1.46.12.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.49.2.1 18-May-2008  yamt sync with head.
 1.50.8.1 19-Oct-2008  haad Sync with HEAD.
 1.50.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.50.2.2 16-May-2009  yamt sync with head
 1.50.2.1 04-May-2009  yamt sync with head.
 1.51.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.51.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.53.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.53.4.1 31-May-2011  rmind sync with head
 1.55.4.3 30-Oct-2012  yamt sync with head
 1.55.4.2 17-Apr-2012  yamt sync with head
 1.55.4.1 10-Nov-2011  yamt sync with head
 1.56.4.1 18-Feb-2012  mrg merge to -current.
 1.58.8.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.58.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.58.6.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.58.6.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.58.2.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.58.2.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.59.12.1 10-Aug-2014  tls Rebase.
 1.59.4.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.59.2.2 03-Dec-2017  jdolecek update from HEAD
 1.59.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.62.2.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.62.2.1 23-Jan-2015  martin branches: 1.62.2.1.2; 1.62.2.1.6;
Pull up following revision(s) (requested by pettai in ticket #441):
sys/netinet6/ip6_var.h: revision 1.64
sys/netinet6/in6.h: revision 1.82
sys/netinet6/in6_src.c: revision 1.56
sys/netinet6/mld6.c: revision 1.62
sys/netinet6/ip6_input.c: revision 1.150
sys/netinet6/ip6_output.c: revision 1.161
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.
Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.62.2.1.6.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.62.2.1.2.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.63.2.6 28-Aug-2017  skrll Sync with HEAD
 1.63.2.5 05-Feb-2017  skrll Sync with HEAD
 1.63.2.4 05-Dec-2016  skrll Sync with HEAD
 1.63.2.3 05-Oct-2016  skrll Sync with HEAD
 1.63.2.2 09-Jul-2016  skrll Sync with HEAD
 1.63.2.1 06-Apr-2015  skrll Sync with HEAD
 1.67.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.67.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.67.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.71.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.74.6.4 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1661):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.74.6.3 27-Sep-2018  martin Additional change needed for ticket #1041:

sys/netinet6/ip6_var.h (apply patch)

When reassembling IPv4/IPv6 packets, ensure each fragment has been subject
to the same IPsec processing. That is to say, that all fragments are ESP,
or AH, or AH+ESP, or none.

Add ipsec flags to struct ip6q.
 1.74.6.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.74.6.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #527):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.80.4.1 10-Jun-2019  christos Sync with HEAD
 1.80.2.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.82.2.1 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1226):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.87.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.27 19-Mar-2019  msaitoh Fix typos in comment:
- s/paylaod/payload/
- s/dstination/destination/
 1.26 27-Sep-2017  ozaki-r branches: 1.26.4;
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
 1.25 21-Jan-2016  riastradh branches: 1.25.10;
Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.24 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.23 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.22 18-May-2014  rmind branches: 1.22.4;
Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.21 06-Aug-2008  plunky branches: 1.21.38; 1.21.44; 1.21.54;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.20 24-Apr-2008  ad branches: 1.20.2; 1.20.4; 1.20.8;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.19 19-Jul-2007  dyoung branches: 1.19.26; 1.19.28; 1.19.30;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.18 17-Feb-2007  dyoung branches: 1.18.4; 1.18.12;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.17 27-Aug-2006  christos branches: 1.17.8;
gc unused member.
 1.16 11-Dec-2005  christos branches: 1.16.4; 1.16.8;
merge ktrace-lwp.
 1.15 22-Apr-2004  matt branches: 1.15.12;
Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.14 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.13 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.12 29-Jun-2003  fvdl branches: 1.12.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.11 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.10 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.9 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.8 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.7 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.6 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.5 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.4 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.3 03-Jul-1999  thorpej branches: 1.3.2; 1.3.8;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6protosw.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6protosw.h was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.3.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.2 03-Aug-2004  skrll Sync with HEAD
 1.12.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.15.12.4 03-Sep-2007  yamt sync with head.
 1.15.12.3 26-Feb-2007  yamt sync with head.
 1.15.12.2 30-Dec-2006  yamt sync with head.
 1.15.12.1 21-Jun-2006  yamt sync with head.
 1.16.8.1 03-Sep-2006  yamt sync with head.
 1.16.4.1 09-Sep-2006  rpaulo sync with head
 1.17.8.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.18.12.1 15-Aug-2007  skrll Sync with HEAD.
 1.18.4.1 20-Aug-2007  ad Sync with HEAD.
 1.19.30.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.19.30.1 19-Jul-2007  dyoung file ip6protosw.h was added on branch matt-mips64 on 2007-07-19 20:48:58 +0000
 1.19.28.1 18-May-2008  yamt sync with head.
 1.19.26.2 28-Sep-2008  mjf Sync with HEAD.
 1.19.26.1 02-Jun-2008  mjf Sync with HEAD.
 1.20.8.1 19-Oct-2008  haad Sync with HEAD.
 1.20.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.20.2.1 04-May-2009  yamt sync with head.
 1.21.54.1 10-Aug-2014  tls Rebase.
 1.21.44.2 18-May-2014  rmind sync with head
 1.21.44.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.21.38.2 03-Dec-2017  jdolecek update from HEAD
 1.21.38.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.22.4.1 19-Mar-2016  skrll Sync with HEAD
 1.25.10.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.26.4.1 10-Jun-2019  christos Sync with HEAD
 1.14 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.13 23-Apr-2008  thorpej branches: 1.13.36; 1.13.40;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.12 01-Nov-2007  dyoung branches: 1.12.16; 1.12.18;
De-__P().
 1.11 10-Dec-2005  elad branches: 1.11.44; 1.11.46; 1.11.50;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.10 15-Oct-2001  itojun branches: 1.10.18; 1.10.34;
reduce diff with kame. whitespace changes only.
 1.9 30-May-2001  mrg branches: 1.9.2;
use _KERNEL_OPT
 1.8 26-Sep-2000  itojun branches: 1.8.2;
update ip compression algorithm lookup.
attach sadb_comb for IP compression (not in RFC2367;
discussed on pf_key@inner.net). sync with kame
 1.7 06-Jan-2000  itojun branches: 1.7.4;
remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.6 02-Dec-1999  itojun fix comment (sync with KAME)
 1.5 31-Jul-1999  itojun branches: 1.5.2; 1.5.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.4 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ipcomp.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ipcomp.h was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.4.1 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.8.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.8.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.9.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.34.2 15-Nov-2007  yamt sync with head.
 1.10.34.1 21-Jun-2006  yamt sync with head.
 1.10.18.1 11-Dec-2005  christos Sync with head.
 1.11.50.1 13-Nov-2007  bouyer Sync with HEAD
 1.11.46.1 06-Nov-2007  matt sync with HEAD
 1.11.44.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.12.18.1 18-May-2008  yamt sync with head.
 1.12.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.40.1 05-Apr-2012  mrg sync to latest -current.
 1.13.36.1 17-Apr-2012  yamt sync with head
 1.31 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.30 17-Jul-2011  joerg branches: 1.30.2; 1.30.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.29 18-Mar-2009  cegger bzero -> memset
 1.28 05-May-2008  ad branches: 1.28.8; 1.28.14;
Back out previous. It broke the build.
 1.27 04-May-2008  ad Move zlib out of net/ and into kern/. It would probably be better to use
the reachover Makefiles and libz, but this is already here and it works.
 1.26 01-Nov-2007  dyoung branches: 1.26.20;
De-__P().
 1.25 19-Oct-2007  ad machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.24 23-May-2007  christos branches: 1.24.6; 1.24.8; 1.24.12;
Ansify + add a few comments, from Karl Sjödahl
 1.23 16-Nov-2006  christos branches: 1.23.8; 1.23.10;
__unused removal on arguments; approved by core.
 1.22 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.21 30-Aug-2006  christos branches: 1.21.2; 1.21.4;
remove empty code.
 1.20 02-Nov-2002  perry branches: 1.20.22; 1.20.36; 1.20.40;
/*CONTCOND*/ while (0)'ed macros
 1.19 14-Mar-2002  itojun zlib 1.1.4 dislikes Z_FLUSH at the end of inflate().
 1.18 13-Nov-2001  lukem add RCSIDs
 1.17 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.16 26-Sep-2000  itojun branches: 1.16.2; 1.16.4;
update ip compression algorithm lookup.
attach sadb_comb for IP compression (not in RFC2367;
discussed on pf_key@inner.net). sync with kame
 1.15 21-Sep-2000  itojun - repair too strong assumption on mbuf chain.
- correct byte lifetime computation to conform to RFC2401 p23 (use
packet BEFORE compression)
- stabilize deflate calls
- present error messages better
 1.14 21-Sep-2000  itojun repair infinite loop in ipcomp packet generation. oops.
 1.13 20-Sep-2000  itojun do not inject empty mbuf to zlib.
 1.12 20-Sep-2000  itojun call {de,in}flateEnd on failure, otherwise obsolete state will be kept.
 1.11 20-Sep-2000  itojun plug mbuf leak (error case). need more investigation.
 1.10 25-Aug-2000  thorpej Don't use MALLOC() for variable-sized allocations.
 1.9 31-Jan-2000  itojun branches: 1.9.4;
bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.8 26-Jan-2000  itojun don't attach Adler32 checksum to ipcomp payload.
 1.7 16-Jan-2000  itojun fix interop issue in ip compression. for inbound, we need to use
default window size, in case the peer uses large window size
 1.6 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.5 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.4 05-Nov-1999  itojun branches: 1.4.2;
decrease amount of history buffer to use for IPcomp.
the default setting of zlib allocates too much memory and of no use for
network packets (which are like < 2k).

From: Laine Stump <lainestump@rcn.com>
 1.3 03-Jul-1999  thorpej branches: 1.3.2; 1.3.4; 1.3.6;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ipcomp_core.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ipcomp_core.c was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.3.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.4.1 15-Nov-1999  fvdl Sync with -current
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.2.2 05-Nov-1999  itojun decrease amount of history buffer to use for IPcomp.
the default setting of zlib allocates too much memory and of no use for
network packets (which are like < 2k).

From: Laine Stump <lainestump@rcn.com>
 1.4.2.1 05-Nov-1999  itojun file ipcomp_core.c was added on branch comdex-fall-1999 on 1999-11-05 14:56:27 +0000
 1.9.4.3 23-Jan-2003  msaitoh Apply patch (requested by itojun):

allocate route_in6 in struct secashead, to avoid mistakenly overrun
the end of secashead. Fixes PR18751.
 1.9.4.2 20-Mar-2002  he Pull up revisions 1.17-1.19 (requested by fvdl):
Upgrade libz to 1.1.4 due to a possible security bug.
 1.9.4.1 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.16.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.16.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.2.4 11-Nov-2002  nathanw Catch up to -current
 1.16.2.3 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.16.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.20.40.1 03-Sep-2006  yamt sync with head.
 1.20.36.1 09-Sep-2006  rpaulo sync with head
 1.20.22.4 15-Nov-2007  yamt sync with head.
 1.20.22.3 27-Oct-2007  yamt sync with head.
 1.20.22.2 03-Sep-2007  yamt sync with head.
 1.20.22.1 30-Dec-2006  yamt sync with head.
 1.21.4.2 10-Dec-2006  yamt sync with head.
 1.21.4.1 22-Oct-2006  yamt sync with head
 1.21.2.1 18-Nov-2006  ad Sync with head.
 1.23.10.1 11-Jul-2007  mjf Sync with head.
 1.23.8.2 23-Oct-2007  ad Sync with head.
 1.23.8.1 08-Jun-2007  ad Sync with head.
 1.24.12.2 13-Nov-2007  bouyer Sync with HEAD
 1.24.12.1 25-Oct-2007  bouyer Sync with HEAD.
 1.24.8.1 06-Nov-2007  matt sync with HEAD
 1.24.6.2 04-Nov-2007  jmcneill Sync with HEAD.
 1.24.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.26.20.1 04-May-2009  yamt sync with head.
 1.28.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.28.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.30.6.1 05-Apr-2012  mrg sync to latest -current.
 1.30.2.1 17-Apr-2012  yamt sync with head
 1.39 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.38 17-Jul-2011  joerg branches: 1.38.2; 1.38.6; 1.38.8; 1.38.12; 1.38.14;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.37 01-Apr-2011  spz mitigation for CVE-2011-1547
this should really be solved by counting nested headers (like in the
inet6 case) instead
 1.36 05-May-2008  ad branches: 1.36.10; 1.36.16; 1.36.22; 1.36.24; 1.36.28;
Back out previous. It broke the build.
 1.35 04-May-2008  ad Move zlib out of net/ and into kern/. It would probably be better to use
the reachover Makefiles and libz, but this is already here and it works.
 1.34 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.33 09-Dec-2007  degroote branches: 1.33.10; 1.33.12;
Kill _IP_VHL ifdef (from netinet/ip.h history, it has never been used in NetBSD so ...)
 1.32 19-Oct-2007  ad branches: 1.32.4; 1.32.6;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.31 04-Mar-2007  christos branches: 1.31.2; 1.31.14; 1.31.16; 1.31.20;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.30 16-Nov-2006  christos branches: 1.30.2; 1.30.4; 1.30.12;
__unused removal on arguments; approved by core.
 1.29 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.28 14-Feb-2006  rpaulo branches: 1.28.14; 1.28.16;
From FreeBSD:
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.

Noticed by: Coverity Prevent analysis tool
 1.27 11-Dec-2005  christos branches: 1.27.2; 1.27.4; 1.27.6;
merge ktrace-lwp.
 1.26 07-Jul-2005  tron Defopt IPSEC_NAT_T.
 1.25 20-May-2005  manu branches: 1.25.2;
Use NAT-T ports for AH and IPcomp too.
 1.24 29-Apr-2005  yamt move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.23 23-Apr-2005  manu Enhance IPSEC_NAT_T so that it can work with multiple machines behind the
same NAT.
 1.22 11-Feb-2004  itojun branches: 1.22.2; 1.22.6; 1.22.8; 1.22.14; 1.22.16;
KNF
 1.21 02-Jul-2003  itojun typo. found by markus@openbsd
 1.20 11-Sep-2002  itojun branches: 1.20.6;
correct signedness mixup in pointer passing. sync w/kame
 1.19 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.18 13-Nov-2001  lukem branches: 1.18.8;
add RCSIDs
 1.17 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.16 01-Mar-2001  itojun branches: 1.16.2; 1.16.4;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.15 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.14 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.13 26-Sep-2000  itojun update ip compression algorithm lookup.
attach sadb_comb for IP compression (not in RFC2367;
discussed on pf_key@inner.net). sync with kame
 1.12 21-Sep-2000  itojun - repair too strong assumption on mbuf chain.
- correct byte lifetime computation to conform to RFC2401 p23 (use
packet BEFORE compression)
- stabilize deflate calls
- present error messages better
 1.11 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.10 17-Feb-2000  darrenr branches: 1.10.4;
Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.9 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.8 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.7 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.6 05-Nov-1999  itojun branches: 1.6.2;
fix well-known CPI handling bug. (sync with KAME code)
 1.5 30-Jul-1999  itojun branches: 1.5.2; 1.5.4; 1.5.6;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 06-Jul-1999  itojun fix IPSEC (but not INET6) build.

PR: 7921, 7922, 7924
From: rafal@mediaone.net
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ipcomp_input.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ipcomp_input.c was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.5.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.4.1 15-Nov-1999  fvdl Sync with -current
 1.5.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.5.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.2.2 05-Nov-1999  itojun fix well-known CPI handling bug. (sync with KAME code)
 1.6.2.1 05-Nov-1999  itojun file ipcomp_input.c was added on branch comdex-fall-1999 on 1999-11-05 14:57:01 +0000
 1.10.4.4 06-Apr-2001  he Pull up revision 1.15 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.10.4.3 11-Mar-2001  he Pull up revision 1.16 (requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.10.4.2 02-Oct-2000  itojun pullup (approved by releng-1-5)
correct ipsecstat/ipsec6stat mixup.

netinet6/ah_input.c 1.18 -> 1.19
netinet6/ah_output.c 1.11 -> 1.12 (part of)
netinet6/esp_input.c 1.8 -> 1.9 (part of)
netinet6/esp_output.c 1.8 -> 1.9
netinet6/icmp6.c 1.43 -> 1.44
netinet6/ipcomp_input.c 1.13 -> 1.14
netinet6/ipcomp_output.c 1.13 -> 1.14
 1.10.4.1 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.16.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.4.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.16.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.2.4 17-Sep-2002  nathanw Catch up to -current.
 1.16.2.3 27-Aug-2002  nathanw Catch up to -current.
 1.16.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.18.8.1 29-Aug-2002  gehenna catch up with -current.
 1.20.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.20.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.20.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.20.6.1 03-Aug-2004  skrll Sync with HEAD
 1.22.16.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by jdc in ticket #11393):
sys/netinet6/ipcomp_input.c: revision 1.28
From FreeBSD:
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
 1.22.14.3 22-Nov-2007  bouyer Pull up following revision(s) (requested by jdc in ticket #1879):
sys/netinet6/ipcomp_input.c: revision 1.28
From FreeBSD:
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
 1.22.14.2 18-Jul-2005  riz branches: 1.22.14.2.2; 1.22.14.2.4;
Pull up revision 1.26 (requested by tron in ticket #565):
Defopt IPSEC_NAT_T.
 1.22.14.1 28-Apr-2005  tron Pull up revision 1.23 (requested by man in ticket #201):
Enhance IPSEC_NAT_T so that it can work with multiple machines behind
the same NAT.
 1.22.14.2.4.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by jdc in ticket #1879):
sys/netinet6/ipcomp_input.c: revision 1.28
From FreeBSD:
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
 1.22.14.2.2.1 22-Nov-2007  bouyer Pull up following revision(s) (requested by jdc in ticket #1879):
sys/netinet6/ipcomp_input.c: revision 1.28
From FreeBSD:
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
 1.22.8.1 29-Apr-2005  kent sync with -current
 1.22.6.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by jdc in ticket #11393):
sys/netinet6/ipcomp_input.c: revision 1.28
From FreeBSD:
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
 1.22.2.1 01-Dec-2007  bouyer Pull up following revision(s) (requested by jdc in ticket #11393):
sys/netinet6/ipcomp_input.c: revision 1.28
From FreeBSD:
In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
 1.25.2.5 21-Jan-2008  yamt sync with head
 1.25.2.4 27-Oct-2007  yamt sync with head.
 1.25.2.3 03-Sep-2007  yamt sync with head.
 1.25.2.2 30-Dec-2006  yamt sync with head.
 1.25.2.1 21-Jun-2006  yamt sync with head.
 1.27.6.1 22-Apr-2006  simonb Sync with head.
 1.27.4.1 09-Sep-2006  rpaulo sync with head
 1.27.2.1 18-Feb-2006  yamt sync with head.
 1.28.16.2 10-Dec-2006  yamt sync with head.
 1.28.16.1 22-Oct-2006  yamt sync with head
 1.28.14.1 18-Nov-2006  ad Sync with head.
 1.30.12.1 03-Apr-2011  riz Pull up following revision(s) (requested by spz in ticket #1425):
sys/netipsec/xform_ipcomp.c: revision 1.26
sys/netinet6/ipcomp_input.c: revision 1.37
mitigation for CVE-2011-1547
this should really be solved by counting nested headers (like in the
inet6 case) instead
mitigation for CVE-2011-1547
 1.30.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.30.2.1 03-Apr-2011  riz Pull up following revision(s) (requested by spz in ticket #1425):
sys/netipsec/xform_ipcomp.c: revision 1.26
sys/netinet6/ipcomp_input.c: revision 1.37
mitigation for CVE-2011-1547
this should really be solved by counting nested headers (like in the
inet6 case) instead
mitigation for CVE-2011-1547
 1.31.20.1 25-Oct-2007  bouyer Sync with HEAD.
 1.31.16.2 09-Jan-2008  matt sync with HEAD
 1.31.16.1 06-Nov-2007  matt sync with HEAD
 1.31.14.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.31.2.1 23-Oct-2007  ad Sync with head.
 1.32.6.1 11-Dec-2007  yamt sync with head.
 1.32.4.1 26-Dec-2007  ad Sync with head.
 1.33.12.1 18-May-2008  yamt sync with head.
 1.33.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.36.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.36.24.1 03-Apr-2011  jdc Pull up:
src/sys/netinet6/ipcomp_input.c revision 1.37
src/sys/netipsec/xform_ipcomp.c revision 1.26

(requested by spz in ticket #1590).

mitigation for CVE-2011-1547
 1.36.22.1 21-Apr-2011  rmind sync with head
 1.36.16.1 03-Apr-2011  jdc Pull up:
src/sys/netinet6/ipcomp_input.c revision 1.37
src/sys/netipsec/xform_ipcomp.c revision 1.26

(requested by spz in ticket #1590).

mitigation for CVE-2011-1547
 1.36.10.1 03-Apr-2011  jdc Pull up:
src/sys/netinet6/ipcomp_input.c revision 1.37
src/sys/netipsec/xform_ipcomp.c revision 1.26

(requested by spz in ticket #1590).

mitigation for CVE-2011-1547
 1.38.14.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.38.12.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.38.8.1 30-Jan-2018  martin Ooops, remainder of Ticket #1523, accidently not commited previously
 1.38.6.1 05-Apr-2012  mrg sync to latest -current.
 1.38.2.1 17-Apr-2012  yamt sync with head
 1.31 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.30 17-Jul-2011  joerg branches: 1.30.2; 1.30.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.29 18-Mar-2009  cegger bzero -> memset
 1.28 05-May-2008  ad branches: 1.28.8; 1.28.14;
Back out previous. It broke the build.
 1.27 04-May-2008  ad Move zlib out of net/ and into kern/. It would probably be better to use
the reachover Makefiles and libz, but this is already here and it works.
 1.26 23-Apr-2008  thorpej branches: 1.26.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.25 09-Dec-2007  degroote branches: 1.25.10; 1.25.12;
Kill _IP_VHL ifdef (from netinet/ip.h history, it has never been used in NetBSD so ...)
 1.24 01-Nov-2007  dyoung branches: 1.24.4; 1.24.6;
De-__P().
 1.23 19-Oct-2007  ad machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.22 22-Sep-2007  degroote branches: 1.22.4;
{ah,esp,ipcomp}_output must return 0 on success. On failure, it returns the
error and m is freed. Previously, it was not the case in ipcomp and esp case
(aka in some case, it returns 0 with m freed, or an error and m was not freed).

In ipcomp_output, fix some leak of mcopy too.

Use the same error path in {ah,esp,ipcomp}_output.

Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.21 23-May-2007  christos branches: 1.21.6; 1.21.8;
Ansify + add a few comments, from Karl Sjödahl
 1.20 24-Nov-2006  christos branches: 1.20.2; 1.20.8; 1.20.10; 1.20.16;
fix spelling of accommodate; from Zapher.
 1.19 11-Dec-2005  christos branches: 1.19.20; 1.19.22;
merge ktrace-lwp.
 1.18 13-Feb-2004  wiz branches: 1.18.14; 1.18.16;
Uppercase CPU, plural is CPUs.
 1.17 09-Jun-2002  itojun branches: 1.17.6;
whitespace cleanup
 1.16 13-Nov-2001  lukem branches: 1.16.8;
add RCSIDs
 1.15 15-Oct-2001  itojun reduce diff with kame. whitespace changes only.
 1.14 02-Oct-2000  itojun branches: 1.14.2; 1.14.4;
fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.13 26-Sep-2000  itojun update ip compression algorithm lookup.
attach sadb_comb for IP compression (not in RFC2367;
discussed on pf_key@inner.net). sync with kame
 1.12 21-Sep-2000  itojun - repair too strong assumption on mbuf chain.
- correct byte lifetime computation to conform to RFC2401 p23 (use
packet BEFORE compression)
- stabilize deflate calls
- present error messages better
 1.11 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.10 19-May-2000  thorpej branches: 1.10.4;
NULL != 0
 1.9 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.8 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.7 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.6 10-Sep-1999  itojun branches: 1.6.2;
fix ipcomp behavior against -R to meet documentation.
From: Laine Stump <lainestump@rcn.com>
 1.5 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 06-Jul-1999  itojun fix IPSEC (but not INET6) build.

PR: 7921, 7922, 7924
From: rafal@mediaone.net
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ipcomp_output.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ipcomp_output.c was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.2 02-Oct-2000  itojun pullup (approved by releng-1-5)
correct ipsecstat/ipsec6stat mixup.

netinet6/ah_input.c 1.18 -> 1.19
netinet6/ah_output.c 1.11 -> 1.12 (part of)
netinet6/esp_input.c 1.8 -> 1.9 (part of)
netinet6/esp_output.c 1.8 -> 1.9
netinet6/icmp6.c 1.43 -> 1.44
netinet6/ipcomp_input.c 1.13 -> 1.14
netinet6/ipcomp_output.c 1.13 -> 1.14
 1.10.4.1 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.14.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.14.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.14.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.14.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.14.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.16.8.1 20-Jun-2002  gehenna catch up with -current.
 1.17.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.17.6.1 03-Aug-2004  skrll Sync with HEAD
 1.18.16.5 21-Jan-2008  yamt sync with head
 1.18.16.4 15-Nov-2007  yamt sync with head.
 1.18.16.3 27-Oct-2007  yamt sync with head.
 1.18.16.2 03-Sep-2007  yamt sync with head.
 1.18.16.1 30-Dec-2006  yamt sync with head.
 1.18.14.1 23-Sep-2007  bouyer Pull up following revision(s) (requested by degroote in ticket #1846):
sys/netinet6/ipcomp_output.c: revision 1.22
sys/netinet6/ah_output.c: revision 1.30
sys/netinet6/esp_output.c: revision 1.30
Fix some possible mbuf leak in kame ipsec code.
Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.19.22.1 10-Dec-2006  yamt sync with head.
 1.19.20.1 12-Jan-2007  ad Sync with head.
 1.20.16.1 30-Sep-2007  wrstuden Catch up on netbsd-4 as of a few days ago.
 1.20.10.1 11-Jul-2007  mjf Sync with head.
 1.20.8.3 23-Oct-2007  ad Sync with head.
 1.20.8.2 09-Oct-2007  ad Sync with head.
 1.20.8.1 08-Jun-2007  ad Sync with head.
 1.20.2.1 25-Sep-2007  xtraeme Pull up following revision(s) (requested by degroote in ticket #896):
sys/netinet6/ipcomp_output.c: revision 1.22
sys/netinet6/ah_output.c: revision 1.30
sys/netinet6/esp_output.c: revision 1.30

{ah,esp,ipcomp}_output must return 0 on success. On failure, it returns the
error and m is freed. Previously, it was not the case in ipcomp and esp case
(aka in some case, it returns 0 with m freed, or an error and m was not freed).

In ipcomp_output, fix some leak of mcopy too.

Use the same error path in {ah,esp,ipcomp}_output.

Problem was reported by Wolfgang Stukenbrock in pr/36768.
 1.21.8.2 09-Jan-2008  matt sync with HEAD
 1.21.8.1 06-Nov-2007  matt sync with HEAD
 1.21.6.3 04-Nov-2007  jmcneill Sync with HEAD.
 1.21.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.21.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.22.4.2 13-Nov-2007  bouyer Sync with HEAD
 1.22.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.24.6.1 11-Dec-2007  yamt sync with head.
 1.24.4.1 26-Dec-2007  ad Sync with head.
 1.25.12.1 18-May-2008  yamt sync with head.
 1.25.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.26.2.1 04-May-2009  yamt sync with head.
 1.28.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.28.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.30.6.1 05-Apr-2012  mrg sync to latest -current.
 1.30.2.1 17-Apr-2012  yamt sync with head
 1.146 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.145 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.144 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.143 30-Dec-2009  elad branches: 1.143.12; 1.143.16;
Collapse identical switch cases.
 1.142 07-May-2009  elad Remove some more "priv" variable usage in favor of kauth(9) calls.
 1.141 06-May-2009  elad Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.140 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.139 19-Mar-2009  he Correct two more bungled bcopy() -> memcpy() conversions.
 1.138 18-Mar-2009  cegger bcopy -> memcpy
 1.137 18-Mar-2009  cegger bzero -> memset
 1.136 18-Mar-2009  cegger bcmp -> memcmp
 1.135 18-Mar-2009  cegger Ansify function definitions w/o arguments. Generated with sed.
 1.134 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.133 11-Oct-2008  pooka branches: 1.133.2; 1.133.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
 1.132 27-Jun-2008  cegger branches: 1.132.2;
remove undeclared caddr_t. makes i386 ALL kernel build again.
 1.131 27-Jun-2008  mlelstv Verify icmp type and code in IPSEC rules.
Fixes PR kern/39018
 1.130 04-May-2008  thorpej branches: 1.130.2; 1.130.4;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.129 23-Apr-2008  thorpej branches: 1.129.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.128 15-Apr-2008  thorpej branches: 1.128.2;
Make ip6 and icmp6 stats per-cpu.
 1.127 12-Apr-2008  thorpej Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.126 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.125 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.124 06-Feb-2008  bjs branches: 1.124.2; 1.124.6;
ip_newid() -> ip_newid(NULL) due to Matt Thomas' commit some hours ago;
The function now requires a pointer (to struct in_ifaddr) as an argument,
i.e. it is no longer ip_newid(void).

Fixes ipsec+inet6 kernel builds.
 1.123 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.122 16-Nov-2007  dyoung branches: 1.122.2; 1.122.6;
Note danger of dangling pointers.
 1.121 10-Jul-2007  christos branches: 1.121.6; 1.121.8; 1.121.12; 1.121.14;
fix printf format.
 1.120 09-Jul-2007  gdt ipsec4_splithdr: If m_len is too short, printf and drop it instead of
panicing. Perhaps should be a pullup instead. This happens very
occasionally on an ultrasparc with tunnel-mode ESP.
 1.119 23-May-2007  christos fix typos in previous
 1.118 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.117 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.116 25-Mar-2007  degroote Make an exact match when we are looking for a cached sp for an unconnected
socket. If we don't make an exact match, we may use a cached rule which
has lower priority than a rule that would otherwise have matched the
packet.

Code submitted by Karl Knutsson in PR/36051
 1.115 04-Mar-2007  christos branches: 1.115.2; 1.115.4; 1.115.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.114 20-Dec-2006  mlelstv branches: 1.114.2;
do not compare ipv6 ipsec tunnel addresses against uninitialized data.
Fixes PR kern/34734
 1.113 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.112 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.111 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.110 16-Nov-2006  christos branches: 1.110.2;
__unused removal on arguments; approved by core.
 1.109 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.108 07-Jun-2006  kardel branches: 1.108.6; 1.108.8;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.107 25-Feb-2006  wiz branches: 1.107.2; 1.107.8;
Fix some typos.
 1.106 21-Jan-2006  rpaulo branches: 1.106.2; 1.106.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.105 11-Dec-2005  christos branches: 1.105.2;
merge ktrace-lwp.
 1.104 09-Sep-2005  christos PR/25658: Steve Woodford: Default value of net.inet.ipsec.dfbit breaks PMTU
over IPsec tunnels.
I have changed the default to 2 [copy]. I've verified that this works with
all my IPSEC setups, and this change has also been discussed in tech-net.
 1.103 18-Aug-2005  yamt - introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.102 07-May-2005  christos branches: 1.102.2;
PR/30154: YAMAMOTO Takashi: tcp_close locking botch
One more so_uid -> so_uidinfo change.
 1.101 09-Mar-2005  itojun correct mistake reported by VANHULLEBUS Yvan
 1.100 26-Feb-2005  perry nuke trailing whitespace
 1.99 27-Oct-2004  itojun branches: 1.99.4; 1.99.6;
remove extra code mistakenly committed
 1.98 27-Oct-2004  itojun missing break; Emmanuel Dreyfus
 1.97 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.96 20-Apr-2004  itojun remove duplicated #include. PR 25234
 1.95 24-Mar-2004  atatat branches: 1.95.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.94 02-Mar-2004  thorpej Augment the PCB cache with a "hint" that can be used to short-circuit
IPsec processing in other places. The hint has 3 values: MAYBE, YES,
and NO. Hints are initialized to MAYBE, and MAYBE is always used for
unconnected sockets (since the spidx may change for every packet
that is output). For connected sockets, NONE and BYPASS policies cause
the hint to be set to NO, and all other policies to YES.

Also shuffle the PCB cache data structure, turning 3 arrays into a
single array of a struct.
 1.93 24-Feb-2004  wiz occured -> occurred. From Peter Postma.
 1.92 11-Feb-2004  itojun missing bzero
 1.91 13-Jan-2004  itojun typo.
http://sources.zabbadoz.net/freebsd/patchset/108-ipsec-spelling.diff
 1.90 13-Jan-2004  itojun plug memory leak on failure.
http://sources.zabbadoz.net/freebsd/patchset/109-ipsec-memleak.diff
 1.89 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.88 17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.87 03-Oct-2003  itojun no need to clear mbuf flags here; sync w/kame
 1.86 03-Oct-2003  itojun when dropping M_PKTHDR, need to free m_tag associated with it.
 1.85 03-Oct-2003  itojun use in6_{embed,recover}scope for scoped address manipulation
 1.84 02-Oct-2003  itojun permit tunnel mode over link-local address. (outer header is link-local)
iij seil team
 1.83 02-Oct-2003  itojun handle link-local address in ipsec6_tunnel_validate(). from iij seli team
 1.82 22-Sep-2003  itojun mark security policy that should persist in the system "persistent".
this should prevent recently-reported kernel panic when "spdflush" is issued.
 1.81 12-Sep-2003  itojun remove extra blank line
 1.80 12-Sep-2003  itojun make synchronization w/ PF tag support code easier
 1.79 12-Sep-2003  itojun make it possible to SADB_DUMP via sysctl. request by mrg
 1.78 10-Sep-2003  itojun record socket * associated with secpolicy
 1.77 07-Sep-2003  itojun - prepare for RFC2401bis 64bit sequence number (no behavior change yet)
- use hash for SPI-based SAD entry lookup (should be faster, i hope)
- cleanup keydb.c and key.c. key.c is responsible for refcounting secasvar,
keydb.c is responsible for alloc/free.
 1.76 06-Sep-2003  itojun committed by mistake, sorry
 1.75 06-Sep-2003  itojun correct comment
 1.74 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.73 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.72 22-Aug-2003  itojun allow userland to specify SPD ID. more readable debugging messages.
 1.71 22-Jul-2003  itojun unifdef -U_IP_VHL
 1.70 10-May-2003  darrenr branches: 1.70.2;
bring a small amount of code out of an if() statement that was doing
the same thing for both cases.
 1.69 17-Jan-2003  itojun switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.68 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.67 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.66 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.65 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.64 01-Aug-2002  itojun typo. From: Arto Selonen <arto@selonen.org>, sync w/kame
 1.63 18-Jul-2002  wiz Spell 'should' correctly.
 1.62 27-Jun-2002  itojun reduce kernel stack usage by separating struct secasindex. sync w/kame
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
 1.61 22-Jun-2002  itojun move sanity check upwards. sync w/kame
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
 1.60 22-Jun-2002  itojun avoid listening socket from mistakenly use incorrect cached policy.
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp> sync w/kame
 1.59 21-Jun-2002  itojun sizeof mistake in DIAGNOSTIC path. sync w/kame
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
 1.58 16-Jun-2002  itojun previous commit cached pcb policy too much (when pcb points to
SPD entry that is not ipsec - like "none"). back it out. sync w/kame
 1.57 14-Jun-2002  itojun cache pcb policy as much as possible. in fact, if policy is not
IPSEC_POLICY_IPSEC we don't need to compare spidx. sync w/kame
 1.56 14-Jun-2002  itojun remove redundant line
 1.55 13-Jun-2002  itojun free secpolicy on deepcopy failure
 1.54 12-Jun-2002  itojun deep-copy pcb policy if it is an ipsec policy. assign ID field to all
SPD entries. make it possible for racoon to grab SPD entry on pcb
(racoon side needs some changes). sync w/kame
 1.53 12-Jun-2002  itojun do not copy policy-on-socket at all. avoid copying packet header value to
struct spindex. should reduce memory usage per socket/pcb, and should speedup
ipsec processing. sync w/kame
 1.52 11-Jun-2002  itojun share policy-on-pcb for listening socket. sync w/kame
todo: share even more, avoid frequent updates of spidx
 1.51 11-Jun-2002  itojun avoid variable name confusion. sync w/kame
 1.50 09-Jun-2002  itojun whitespace cleanup
 1.49 08-Jun-2002  itojun whitespace cleanup
 1.48 25-May-2002  itojun re-enable ipsec policy caching onto pcb. refcnt fix and workarounds based on ymmt-san.
 1.47 19-May-2002  itojun branches: 1.47.2;
in sp caching code, check if sp is still alive. sync w/kame
 1.46 10-May-2002  itojun branches: 1.46.2;
disable ipsec policy caching on pcb, as it seems that there's some reference-
counting mistake that causes panic - see PR 15953 and 13813.

i am unable to find the real cause of problem, so it is a shortterm workaround,
hopefully.
 1.45 10-May-2002  itojun remove unneeded #ifdef __FreeBSD__ portion.
 1.44 28-Apr-2002  thorpej Use M_READONLY() rathern than testing to see if ext_free is set
or MCLISREFERENCED().
 1.43 21-Nov-2001  itojun update outgoing ifp, only if tunnel mode ipsec is used. this is to
honor IP_MULTICAST_IF setsockopt on ipsec-over-multicast. sync with kame
 1.42 13-Nov-2001  lukem add RCSIDs
 1.41 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.40 16-Oct-2001  itojun branches: 1.40.2;
more whitespace/comment sync with kame
 1.39 16-Sep-2001  wiz Spell 'occurred' with two 'r's.
 1.38 13-Sep-2001  itojun fix SA lookup when IPsec transport mode and tunnel mode over IPv6 is used
at the same time. sync with kame
(like "IP AH ESP IP", policy = "esp/tunnel/a-b/use ah/transport//use")
 1.37 06-Aug-2001  itojun branches: 1.37.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.36 05-Aug-2001  itojun cosmetic (spacing near /* */). sync with kame
 1.35 07-Jul-2001  itojun branches: 1.35.2;
have ovbcopy() macro, for cross-BSD compatibility only.
 1.34 15-Apr-2001  itojun do not copy TTL field on ipsec tunnel mode encapsulation. sync with kame
 1.33 08-Feb-2001  itojun branches: 1.33.2;
send up dst_unreach_admin error to local node, if transport-mode
ipsec key is not found. rather experimental. kame 1.83 -> 1.84

nuke IPSEC_SRCSEL which does not do the right thing.
adjust state->ro if the tunnel endpoint is offlink. KAME PR 233.
kame 1.84 -> 1.85
 1.32 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.31 10-Nov-2000  itojun fix KAME PR 296 again, for transport-mode SA only
(shortterm workaround - need revisit for ANY SA)
 1.30 09-Nov-2000  itojun backout KAME PR 296. "any" mode SA should be able to be used for tunnel mode.
 1.29 06-Nov-2000  itojun check IPsec SA type (tunnel/transport/any) when we try to decapsulate IPsec
tunnel mode packet. decapsulate only if we got a tunnel mode SA.
KAME PR 296.
 1.28 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.27 25-Sep-2000  itojun make ip6_ext available for non-IPv6 compilation
(needed for header chain parsing). (redo of 1.25 -> 1.26)
 1.26 25-Sep-2000  martin Make kernels with IPSec but without IPv6 compile again.
This may break IPPROTO_AH - someone with a clue should double-check
this, please.
 1.25 22-Sep-2000  itojun cleanup ipsec policy lookup. specifically, repair the following cases:
- use of IPv4 mapped address on outbound socket
- explicit port numbers via sendto().
old code grabbed port number from inpcb/in6pcb.
in the above case, old code failed to lookup ipsec policy (oops).
sync with kame.
 1.24 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.23 15-Jun-2000  itojun branches: 1.23.2;
remove obsolete sysctl MIB net.inet.ipsec.inbound_call_ike.
(sync with kame)
 1.22 12-Jun-2000  itojun sync with almost-latest KAME IPsec. full changelog would be too big
to mention here. notable changes are like below.

kernel:
- make PF_KEY kernel interface more robust against broken input stream.
it includes complete internal structure change in sys/netkey/key.c.
- remove non-RFC compliant change in PF_KEY API, in particular,
in struct sadb_msg. we cannot just change these standard structs.
sadb_x_sa2 is introduced instead.
- remove prototypes for pfkey_xx functions from /usr/include/net/pfkeyv2.h.
these functions are not supplied in /usr/lib.

setkey(8):
- get/delete does not require "-m mode" (ignored with warning, if you
specify it)
- spddelete takes direction specification
 1.21 03-Jun-2000  itojun sync with recent kame.
avoid use of macros to manipulate sockaddrs (hides error case too much).
correct IPv4 packet handling when ip option is present.
preparations for ipsec policy engine upgrades.
 1.20 08-May-2000  thorpej branches: 1.20.2;
Remove junk at the end of #undef.
 1.19 21-Mar-2000  itojun cleanup AH/policy processing.
- parse IPv6 header by using common function, ip6_{last,next}hdr.
- fix behaivior in multiple AH cases.
make strict boundary checks on mbuf chasing.
(sync with latest kame)
 1.18 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.17 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.16 25-Feb-2000  itojun remove unnecessary if - else clause.
(sync with kame)
 1.15 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.14 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.13 16-Jan-2000  itojun add missing ipcomp cases.
 1.12 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.11 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.10 25-Aug-1999  itojun branches: 1.10.2; 1.10.8;
sync with recent kame: fix source address selection on IPv6 tunnel ipsec.
 1.9 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.8 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.7 11-Jul-1999  itojun fix compilation/runtime problem on alpha.

PR: 7952, 7953
From: Dave Huang <khym@bga.com>
 1.6 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.5 06-Jul-1999  itojun fix IPSEC (but not INET6) build.

PR: 7921, 7922, 7924
From: rafal@mediaone.net
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ipsec.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ipsec.c was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.10.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.10.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.10.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.10.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.10.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.20.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23.2.10 10-Dec-2004  jmc Pullup rev 1.98-1.99 (requested by itojun in ticket #178)

Add a missing break
 1.23.2.9 07-Apr-2004  jmc Pullup rev 1.87 (requested by itojun in ticket #101)

No need to clear mbuf flags here
 1.23.2.8 07-Apr-2004  jmc Pullup rev 1.84 (requested by itojun in ticket #92)

Permit tunnel mode over link-local address. (outer header is link-local)
 1.23.2.7 07-Apr-2004  jmc Pullup rev 1.83 (requested by itojun in ticket #91)

Handle link-local address in ipsec6_tunnel_validate().
 1.23.2.6 24-Apr-2001  he Pull up revision 1.34 (requested by itojun):
Do not copy TTL field on ipsec tunnel mode encapsulation.
 1.23.2.5 06-Apr-2001  he Pull up revision 1.33 (partial, via patch, requested by itojun):
Remove IPSEC_SRCSEL which does not do the right thing.
Adjust state->ro if the tunnel endpoint is offlink.
 1.23.2.4 06-Apr-2001  he Pull up revision 1.32 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.23.2.3 10-Nov-2000  tv Pullup 1.30 and 1.31 [itojun]:
Fix previous pullup from != IPSEC_MODE_TUNNEL to == IPSEC_MODE_TRANSPORT.
 1.23.2.2 10-Nov-2000  tv Pullup 1.29 [itojun]:
check IPsec SA type (tunnel/transport/any) when we try to decapsulate IPsec
tunnel mode packet. decapsulate only if we got a tunnel mode SA.
KAME PR 296.
 1.23.2.1 29-Sep-2000  itojun pullup (approved by releng-1-5)

cleanup ipsec policy lookup, to fix IPv4 mapped address (outbound) and
explicit port number (sendto).
sys/netinet6/ipsec.c 1.24 -> 1.27
 1.33.2.13 17-Jan-2003  thorpej Sync with HEAD.
 1.33.2.12 18-Oct-2002  nathanw Catch up to -current.
 1.33.2.11 17-Sep-2002  nathanw Catch up to -current.
 1.33.2.10 27-Aug-2002  nathanw Catch up to -current.
 1.33.2.9 13-Aug-2002  nathanw Catch up to -current.
 1.33.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.33.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.33.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.33.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.33.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.33.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.33.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.33.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.35.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.35.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.35.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.35.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.35.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.37.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.40.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.46.2.5 29-Aug-2002  gehenna catch up with -current.
 1.46.2.4 20-Jul-2002  gehenna catch up with -current.
 1.46.2.3 15-Jul-2002  gehenna catch up with -current.
 1.46.2.2 20-Jun-2002  gehenna catch up with -current.
 1.46.2.1 30-May-2002  gehenna Catch up with -current.
 1.47.2.8 19-Mar-2005  tron Pull up revision 1.101 (requested by itojun in ticket #5711):
correct mistake reported by VANHULLEBUS Yvan
 1.47.2.7 19-Mar-2005  tron Pull up revision 1.99 (requested by itojun in ticket #1774):
remove extra code mistakenly committed
 1.47.2.6 19-Mar-2005  tron Pull up revision 1.98 (requested by itojun in ticket #1774):
missing break; Emmanuel Dreyfus
 1.47.2.5 05-Oct-2003  tron Pull up revision 1.87 (requested by itojun in ticket #1508):
no need to clear mbuf flags here; sync w/kame
 1.47.2.4 02-Oct-2003  tron Pull up revision 1.84 (requested by itojun in ticket #1498):
permit tunnel mode over link-local address. (outer header is link-local)
iij seil team
 1.47.2.3 02-Oct-2003  tron Pull up revision 1.83 (requested by itojun in ticket #1497):
handle link-local address in ipsec6_tunnel_validate(). from iij seli team
 1.47.2.2 23-Jul-2002  lukem Pull up revisions 1.61-1.62 via patch (requested by itojun in ticket #524):
1.62:
reduce kernel stack usage by separating struct secasindex. sync w/kame
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
1.61:
move sanity check upwards. sync w/kame
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>

[To prevent possible kernel stack overflow - releng-1-6]
 1.47.2.1 29-May-2002  thorpej Ticket #22, pullup revision 1.48 (itojun). Original commit message:

> re-enable ipsec policy caching onto pcb. refcnt fix and workarounds
> based on ymmt-san.
 1.70.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.70.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.70.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.70.2.4 02-Nov-2004  skrll Sync with HEAD.
 1.70.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.70.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.70.2.1 03-Aug-2004  skrll Sync with HEAD
 1.95.2.2 16-Mar-2005  tron Pull up revision 1.101 (requested by itojun in ticket #1327):
correct mistake reported by VANHULLEBUS Yvan
 1.95.2.1 28-May-2004  tron branches: 1.95.2.1.2;
Pull up revision 1.97 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.95.2.1.2.2 16-Mar-2005  tron Pull up revision 1.101 (requested by itojun in ticket #1327):
correct mistake reported by VANHULLEBUS Yvan
 1.95.2.1.2.1 30-Jan-2005  he Pull up revisions 1.98-1.99 (requested by itojun in ticket #954):
Add a missing break.
 1.99.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.99.4.1 29-Apr-2005  kent sync with -current
 1.102.2.6 11-Feb-2008  yamt sync with head.
 1.102.2.5 21-Jan-2008  yamt sync with head
 1.102.2.4 07-Dec-2007  yamt sync with head
 1.102.2.3 03-Sep-2007  yamt sync with head.
 1.102.2.2 30-Dec-2006  yamt sync with head.
 1.102.2.1 21-Jun-2006  yamt sync with head.
 1.105.2.2 01-Mar-2006  yamt sync with head.
 1.105.2.1 01-Feb-2006  yamt sync with head.
 1.106.4.2 22-Apr-2006  simonb Sync with head.
 1.106.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.106.2.2 09-Sep-2006  rpaulo sync with head
 1.106.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.107.8.1 19-Jun-2006  chap Sync with head.
 1.107.2.1 26-Jun-2006  yamt sync with head.
 1.108.8.4 21-Dec-2006  yamt sync with head.
 1.108.8.3 18-Dec-2006  yamt sync with head.
 1.108.8.2 10-Dec-2006  yamt sync with head.
 1.108.8.1 22-Oct-2006  yamt sync with head
 1.108.6.2 12-Jan-2007  ad Sync with head.
 1.108.6.1 18-Nov-2006  ad Sync with head.
 1.110.2.3 26-Jul-2007  liamjfoy Pull up following revision(s) (requested by gdt in ticket #790):
sys/netinet6/ipsec.c: revision 1.121
ipsec4_splithdr: If m_len is too short, printf and drop it instead of
panicing. Perhaps should be a pullup instead. This happens very
occasionally on an ultrasparc with tunnel-mode ESP.
fix printf format.
 1.110.2.2 12-May-2007  pavel branches: 1.110.2.2.2;
Pull up following revision(s) (requested by degroote in ticket #630):
sys/netipsec/key.c: revision 1.43-1.46
sys/netinet6/ipsec.c: revision 1.116
sys/netipsec/ipsec.c: revision 1.29 via patch
sys/netkey/key.c: revision 1.154-1.155
Call key_checkspidup with spi in network bit order in order to make
comparaison with spi stored into the sadb.
Reported by Karl Knutsson in kern/36038 .

Make an exact match when we are looking for a cached sp for an unconnected
socket. If we don't make an exact match, we may use a cached rule which
has lower priority than a rule that would otherwise have matched the
packet.
Code submitted by Karl Knutsson in PR/36051

Fix a memleak in key_spdget.
Problem was reported by Karl Knutsson by pr/36119.

In spddelete2, if we can't find the sp by this id, return after sending an
error message, don't process the following code with the NULL sp.
Spotted by Matthew Grooms on freebsd-net ML

When we construct an answer for SADB_X_SPDGET, don't use an hardcoded 0 for seq but
the seq used by the request. It will improve consistency with the answer of SADB_GET
request and helps some applications which relies both on seq and pid.
Reported by Karl Knutsson by pr/36119.
 1.110.2.1 20-Dec-2006  bouyer Pull up following revision(s) (requested by mlelstv in ticket #293):
sys/netinet6/ipsec.c: revision 1.114
do not compare ipv6 ipsec tunnel addresses against uninitialized data.
Fixes PR kern/34734
 1.110.2.2.2.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.114.2.3 07-May-2007  yamt sync with head.
 1.114.2.2 15-Apr-2007  yamt sync with head.
 1.114.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.115.6.1 29-Mar-2007  reinoud Pullup to -current
 1.115.4.1 11-Jul-2007  mjf Sync with head.
 1.115.2.3 15-Jul-2007  ad Sync with head.
 1.115.2.2 08-Jun-2007  ad Sync with head.
 1.115.2.1 10-Apr-2007  ad Sync with head.
 1.121.14.3 18-Feb-2008  mjf Sync with HEAD.
 1.121.14.2 27-Dec-2007  mjf Sync with HEAD.
 1.121.14.1 19-Nov-2007  mjf Sync with HEAD.
 1.121.12.1 18-Nov-2007  bouyer Sync with HEAD
 1.121.8.2 23-Mar-2008  matt sync with HEAD
 1.121.8.1 09-Jan-2008  matt sync with HEAD
 1.121.6.1 21-Nov-2007  joerg Sync with HEAD.
 1.122.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.122.2.1 26-Dec-2007  ad Sync with head.
 1.124.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.124.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.124.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.124.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.128.2.1 18-May-2008  yamt sync with head.
 1.129.2.4 11-Mar-2010  yamt sync with head
 1.129.2.3 16-May-2009  yamt sync with head
 1.129.2.2 04-May-2009  yamt sync with head.
 1.129.2.1 16-May-2008  yamt sync with head.
 1.130.4.1 27-Jun-2008  simonb Sync with head.
 1.130.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.132.2.1 19-Oct-2008  haad Sync with HEAD.
 1.133.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.133.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.143.16.2 05-Apr-2012  mrg sync to latest -current.
 1.143.16.1 18-Feb-2012  mrg merge to -current.
 1.143.12.1 17-Apr-2012  yamt sync with head
 1.55 06-Sep-2018  maxv Remove netinet6/ipsec.h.
 1.54 22-Mar-2012  drochner branches: 1.54.38; 1.54.40;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.53 06-Jan-2012  drochner more IPSEC header cleanup: don't install unneeded headers to userland,
and remove some differences berween KAME and FAST_IPSEC
 1.52 04-Jan-2012  drochner -consistently use "char *" for the compiled policy buffer in the
ipsec_*_policy() functions, as it was documented and used by clients
-remove "ipsec_policy_t" which was undocumented and only present
in the KAME version of the ipsec.h header
-misc cleanup of historical artefacts, and to remove unnecessary
differences between KAME ans FAST_IPSEC
 1.51 06-May-2009  elad branches: 1.51.12; 1.51.16;
Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.50 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.49 14-Feb-2009  christos make created and lastused time_t to avoid 2038 problems.
 1.48 23-Apr-2008  thorpej branches: 1.48.2; 1.48.10; 1.48.16;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.47 02-May-2007  dyoung branches: 1.47.28; 1.47.30;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.46 04-Mar-2007  christos branches: 1.46.2; 1.46.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.45 10-Dec-2005  elad branches: 1.45.4; 1.45.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.44 07-Aug-2005  manu introduce ipsec_policy_t to help user programs with the change of
ipsec_set_policy, ipsec_get_policylen and ipsec_dump_policy prototypes
(using void * instead of caddr_t)
 1.43 26-Jun-2005  christos branches: 1.43.2;
match the declarations in libipsec.h
 1.42 02-Mar-2004  thorpej branches: 1.42.14;
Augment the PCB cache with a "hint" that can be used to short-circuit
IPsec processing in other places. The hint has 3 values: MAYBE, YES,
and NO. Hints are initialized to MAYBE, and MAYBE is always used for
unconnected sockets (since the spidx may change for every packet
that is output). For connected sockets, NONE and BYPASS policies cause
the hint to be set to NO, and all other policies to YES.

Also shuffle the PCB cache data structure, turning 3 arrays into a
single array of a struct.
 1.41 22-Sep-2003  itojun mark security policy that should persist in the system "persistent".
this should prevent recently-reported kernel panic when "spdflush" is issued.
 1.40 12-Sep-2003  itojun make it possible to SADB_DUMP via sysctl. request by mrg
 1.39 10-Sep-2003  itojun record socket * associated with secpolicy
 1.38 07-Sep-2003  itojun prototype should have no variable name
 1.37 06-Sep-2003  itojun committed by mistake, sorry
 1.36 06-Sep-2003  itojun correct comment
 1.35 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.34 22-Aug-2003  itojun allow userland to specify SPD ID. more readable debugging messages.
 1.33 23-Jul-2003  itojun comment typo, from markus@openbsd
 1.32 08-Jul-2003  itojun prototype must not have variable name
 1.31 02-Nov-2002  perry branches: 1.31.6;
/*CONTCOND*/ while (0)'ed macros
 1.30 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.29 12-Jun-2002  itojun deep-copy pcb policy if it is an ipsec policy. assign ID field to all
SPD entries. make it possible for racoon to grab SPD entry on pcb
(racoon side needs some changes). sync w/kame
 1.28 12-Jun-2002  itojun do not copy policy-on-socket at all. avoid copying packet header value to
struct spindex. should reduce memory usage per socket/pcb, and should speedup
ipsec processing. sync w/kame
 1.27 11-Jun-2002  itojun share policy-on-pcb for listening socket. sync w/kame
todo: share even more, avoid frequent updates of spidx
 1.26 09-Jun-2002  itojun whitespace cleanup
 1.25 08-Jun-2002  itojun whitespace cleanup
 1.24 21-Nov-2001  itojun branches: 1.24.8;
update outgoing ifp, only if tunnel mode ipsec is used. this is to
honor IP_MULTICAST_IF setsockopt on ipsec-over-multicast. sync with kame
 1.23 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.22 06-Aug-2001  itojun cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.21 05-Aug-2001  itojun cosmetic (spacing near /* */). sync with kame
 1.20 30-May-2001  mrg branches: 1.20.2;
use _KERNEL_OPT
 1.19 24-Jan-2001  itojun branches: 1.19.2;
- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.18 04-Jan-2001  itojun typo fix. PR 11889
 1.17 19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.16 22-Sep-2000  itojun use real wallclock (got by microtime) to compute IPsec database lifetimes.
previous code used interval timers, and had problem with suspend/resume.
sync with KAME.
 1.15 30-Jul-2000  itojun make ipsec_strerror(3) to return const char *, not char *. sync with kame.
 1.14 15-Jun-2000  itojun branches: 1.14.2;
remove obsolete sysctl MIB net.inet.ipsec.inbound_call_ike.
(sync with kame)
 1.13 03-Jun-2000  itojun sync with recent kame.
avoid use of macros to manipulate sockaddrs (hides error case too much).
correct IPv4 packet handling when ip option is present.
preparations for ipsec policy engine upgrades.
 1.12 01-Mar-2000  itojun branches: 1.12.2;
introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.11 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.10 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.9 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.8 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.7 02-Dec-1999  itojun avoid namespace polution ("#ifdef KERNEL" was mistakingly used)
 1.6 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.5 31-Jul-1999  itojun branches: 1.5.2; 1.5.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.4 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ipsec.h was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ipsec.h was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.14.2.3 06-Apr-2001  he Pull up revision 1.19 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.14.2.2 29-Sep-2000  itojun pullup (approved by releng-1-5)

correct lifetime handling of IPsec keys, so that it won't wrongly
survive across suspend/resume session.
sys/netinet6/ipsec.h 1.15 -> 1.16
sys/netkey/keydb.h 1.7 -> 1.9
sys/netkey/key.c 1.35 -> 1.36

stabilize ipcomp packet handling (if we don't update this SEGV can happen).
sys/netinet6/ipcomp_output.c 1.10 -> 1.13
sys/netinet6/ipcomp_input.c 1.10 -> 1.13
sys/netinet6/ipcomp_core.c 1.9 -> 1.16
sys/netinet6/ipcomp.h 1.7 -> 1.8
sys/netkey/key.c 1.28 -> 1.29, 1.31 -> 1.35, 1.36 -> 1.37

avoid hardcoding IV length. new ESP engine (uses block cipher only,
easier to put per-arch *.S)
sys/netinet6/esp_output.c 1.5 -> 1.8
sys/netinet6/esp_input.c 1.5 -> 1.8
sys/netinet6/esp_core.c 1.7 -> 1.9
sys/netinet6/esp.h 1.11 -> 1.13
sys/netkey/key.c 1.30 -> 1.31
 1.14.2.1 30-Jul-2000  itojun pullup (approved by releng-1-5)

> make ipsec_strerror(3) to return const char *, not char *. sync with kame.

1.7 -> 1.8 basesrc/lib/libipsec/ipsec_strerror.3
1.6 -> 1.7 basesrc/lib/libipsec/ipsec_strerror.c
1.6 -> 1.7 basesrc/lib/libipsec/ipsec_strerror.h
1.14 -> 1.15 syssrc/sys/netinet6/ipsec.h
 1.19.2.7 11-Nov-2002  nathanw Catch up to -current
 1.19.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.19.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.19.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.19.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.19.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.19.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.20.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.20.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.20.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.24.8.1 20-Jun-2002  gehenna catch up with -current.
 1.31.6.5 11-Dec-2005  christos Sync with head.
 1.31.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.31.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.31.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.31.6.1 03-Aug-2004  skrll Sync with HEAD
 1.42.14.2 03-Sep-2005  snj Pull up following revision(s) (requested by tron in ticket #741):
sys/netinet6/ipsec.h: revision 1.44
introduce ipsec_policy_t to help user programs with the change of
ipsec_set_policy, ipsec_get_policylen and ipsec_dump_policy prototypes
(using void * instead of caddr_t)
 1.42.14.1 03-Sep-2005  snj Pull up following revision(s) (requested by tron in ticket #741):
sys/netinet6/ipsec.h: revision 1.43
match the declarations in libipsec.h
 1.43.2.2 03-Sep-2007  yamt sync with head.
 1.43.2.1 21-Jun-2006  yamt sync with head.
 1.45.26.2 07-May-2007  yamt sync with head.
 1.45.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.45.4.1 07-Feb-2006  rpaulo in6pcb -> inpcb.
 1.46.4.1 11-Jul-2007  mjf Sync with head.
 1.46.2.1 08-Jun-2007  ad Sync with head.
 1.47.30.1 18-May-2008  yamt sync with head.
 1.47.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.48.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.48.10.2 28-Apr-2009  skrll Sync with HEAD.
 1.48.10.1 03-Mar-2009  skrll Sync with HEAD.
 1.48.2.2 16-May-2009  yamt sync with head
 1.48.2.1 04-May-2009  yamt sync with head.
 1.51.16.2 05-Apr-2012  mrg sync to latest -current.
 1.51.16.1 18-Feb-2012  mrg merge to -current.
 1.51.12.1 17-Apr-2012  yamt sync with head
 1.54.40.1 10-Jun-2019  christos Sync with HEAD
 1.54.38.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.3 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.2 28-Apr-2008  martin branches: 1.2.4; 1.2.6; 1.2.38; 1.2.42;
Remove clause 3 and 4 from TNF licenses
 1.1 23-Apr-2008  thorpej branches: 1.1.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.1.2.1 16-May-2008  yamt sync with head.
 1.2.42.1 05-Apr-2012  mrg sync to latest -current.
 1.2.38.1 17-Apr-2012  yamt sync with head
 1.2.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.2.6.1 28-Apr-2008  mjf file ipsec_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:27 +0000
 1.2.4.2 18-May-2008  yamt sync with head.
 1.2.4.1 28-Apr-2008  yamt file ipsec_private.h was added on branch yamt-pf42 on 2008-05-18 12:35:35 +0000
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file mip6.c was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file mip6.h was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file mip6_var.h was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.102 05-Jun-2025  ozaki-r Apply if_first_addr() and if_first_addr_psref()
 1.101 25-Sep-2019  ozaki-r branches: 1.101.26; 1.101.32;
Make panic messages more informative
 1.100 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.99 29-May-2018  ozaki-r branches: 1.99.2;
Avoid double LIST_REMOVE which corrupts lists
 1.98 29-May-2018  ozaki-r Move LIST_REMOVE

mld_stoptimer releases in6_multilock temporarily, so we must LIST_REMOVE first.
 1.97 29-May-2018  ozaki-r Make a deletion of in6m in nd6_rtrequest atomic
 1.96 29-May-2018  ozaki-r Make a refcount decrement and a removal from a list of an item atomic

in6m_refcount of an in6m can be incremented if the in6m is on the list
(if_multiaddrs) in in6_addmulti or mld_input. So we must avoid such an
increment when we try to destroy an in6m. To this end we must make
an in6m_refcount decrement and a removal of an in6m from if_multiaddrs
atomic.
 1.95 29-May-2018  ozaki-r Improve atomicity of in6_leavegroup and in6_delmulti
 1.94 29-May-2018  ozaki-r Release in6_multilock on callout_halt of mld_timeo to avoid a deadlock
 1.93 29-May-2018  ozaki-r Don't hold softnet_lock in mld_timeo

Then we can get rid of remaining abuses of mutex_owned(softnet_lock).
 1.92 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.91 01-Feb-2018  maxv branches: 1.91.2;
Style, and remove the 'len' argument from mld_allocbuf(), it is misleading,
we only want a static struct. Beyond that no functional change.
 1.90 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.89 13-May-2017  kardel branches: 1.89.2;
avoid a double ifa_release() and thus a panic when e. g. running ifmcstat
 1.88 02-Mar-2017  ozaki-r branches: 1.88.4;
Plug a race condition on accessing i6mm_maddr
 1.87 02-Mar-2017  ozaki-r Fix racy in6m_sol

Relook up the entry instead of reusing it, which makes locking simple.
 1.86 02-Mar-2017  ozaki-r Protect ia6_memberships by in6_ifaddr_lock
 1.85 01-Mar-2017  ozaki-r Make IPv6 multicast MP-safe partially

To complete the task, we need to make users of IPv6 multicast MP-safe, for
example socket/PCB and CARP.
 1.84 01-Mar-2017  ozaki-r Provide in6_multi_group

Use it when checking if we belong to the group, instead of in6_lookup_multi.

No functional change.
 1.83 23-Feb-2017  ozaki-r Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.82 22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.81 07-Feb-2017  ozaki-r Add missing NULL checks for m_get_rcvif
 1.80 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.79 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.78 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.77 11-Jan-2017  ozaki-r branches: 1.77.2;
Get rid of unnecessary header inclusions
 1.76 10-Jan-2017  ozaki-r Enable some sysctl knobs on rump kernels for ifmcstat
 1.75 18-Nov-2016  knakahara fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.74 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.73 20-Jul-2016  ozaki-r Apply pserialize to some iterations of IP address lists
 1.72 08-Jul-2016  ozaki-r branches: 1.72.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.71 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.70 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.69 22-Jun-2016  ozaki-r Remove unnecessary NULL checks of ifa->ifa_addr

If it's NULL, it should be a bug. There many IFADDR_FOREACH that don't do
NULL check. If it can be NULL, they should fire already.
 1.68 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.67 16-Jun-2016  ozaki-r Use if_get_byindex instead of if_byindex for MP-safe
 1.66 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.65 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.64 12-Nov-2015  joerg Ensure that the callout of the multicast address is valid before
hooking it up.
 1.63 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.62 20-Jan-2015  roy Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.61 12-Nov-2014  ozaki-r branches: 1.61.2;
Ensure callout isn't running and pending before callout_destroy

Call callout_halt before callout_destroy. And also let callout (mld_timeo)
not call callout_schedule when we already called callout_halt.

This fixes PR 47881.
 1.60 09-Sep-2014  rmind Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.59 26-Jul-2014  joerg branches: 1.59.2;
PR 49036: net.inet6 has not been created when the sysctl constructor
for net.inet6.multicast is run.
 1.58 25-Jul-2014  ozaki-r Use IFADDR_FOREACH for iterating if_addrlist of ifnet
 1.57 10-Jun-2014  joerg Introduce new sysctls for obtaining interface-specific addresses:
- net.sdl for the active link-layer adddress (the MAC)
- net.ether.multicast for the Ethernet multicast addresses
- net.inet6.multicast for the IPv6 multicast groups
- net.inet6.multicast_kludge for temporarily removed multicast groups

Use this sysctls for replacing the kmem grovelling in ifmcstat(8).
 1.56 02-Jun-2014  joerg Use explicit initializer.
 1.55 19-Nov-2011  tls branches: 1.55.4; 1.55.8; 1.55.22;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.54 19-Oct-2011  dyoung branches: 1.54.2;
Use if_addr_init() and if_mcast_op() instead of ifp->if_ioctl().
 1.53 31-Aug-2011  plunky NULL does not need a cast
 1.52 21-Apr-2011  dholland Prune dead assignment, from Henning Petersen in PR 44890.
 1.51 04-Aug-2009  dyoung branches: 1.51.4; 1.51.6;
Use malloc(...|M_ZERO) instead of malloc(...) followed by memset(,0,).
 1.50 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.49 18-Mar-2009  cegger bcopy -> memcpy
 1.48 07-Nov-2008  dyoung branches: 1.48.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.47 22-Aug-2008  adrianp branches: 1.47.2;
Fix from matt@ for malformed ICMPv6 MLD query (CVE-2008-2464).
 1.46 22-May-2008  dyoung branches: 1.46.4;
Don't cast to void * unnecessarily.
 1.45 24-Apr-2008  ad branches: 1.45.2; 1.45.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.44 15-Apr-2008  thorpej branches: 1.44.2;
Make ip6 and icmp6 stats per-cpu.
 1.43 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.42 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.41 16-Oct-2007  joerg branches: 1.41.14; 1.41.18;
Inline callout_t in struct in6_multi. This fixes a number of possible
memory leaks. Explicitly destroy the callout before freeing it.
Use callout_setfunc/callout_schedule instead of repeating it for
callout_reset.

Bump NetBSD version to 4.99.34 for kvm users.
 1.40 31-Aug-2007  dyoung branches: 1.40.2;
Use sockaddr_in6_init().
 1.39 09-Jul-2007  ad branches: 1.39.2; 1.39.6; 1.39.8;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.38 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.37 04-Mar-2007  christos branches: 1.37.2; 1.37.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.36 29-Nov-2006  dyoung branches: 1.36.2; 1.36.4; 1.36.8; 1.36.12;
Fix a spelling error.

Annotate a memory leak.

When copying one multicast address list to another, IFAREF before IFAFREE
to protect against using an ifaddr after (accidentally) freeing it.

LIST_REMOVE() a multicast address from its old list before
LIST_INSERT_HEAD() on its new list.

Do not count on in6_delmulti() removing its multicast-record argument
from the multicast address list that the record belongs to, because
clearly that is not what it (always) does.
 1.35 20-Nov-2006  dyoung Cosmetic: use LIST_ macros. Shorten some staircases.

Defensive programming: set an in6_multi's ifaddr reference to NULL
after releasing it, to protect against reuse.
 1.34 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.33 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.32 06-Mar-2006  rpaulo branches: 1.32.12; 1.32.14;
Rename local variables called delay that shadow the delay() decl.
Pointed out by Robert Swindells.
 1.31 05-Mar-2006  rpaulo NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.30 03-Mar-2006  rpaulo branches: 1.30.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.29 21-Jan-2006  rpaulo branches: 1.29.2; 1.29.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.28 11-Dec-2005  christos branches: 1.28.2;
merge ktrace-lwp.
 1.27 26-Feb-2005  perry branches: 1.27.4;
nuke trailing whitespace
 1.26 28-Mar-2004  christos branches: 1.26.8; 1.26.10;
no need for splsoftnet, because the caller does it already.
 1.25 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.24 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.22 06-Jun-2003  itojun branches: 1.22.2;
- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).
 1.21 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.20 09-Jun-2002  itojun whitespace cleanup
 1.19 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.18 08-Jan-2002  itojun branches: 1.18.8; 1.18.10;
do not log() in per-packet input path. sync w/kame
 1.17 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.16 13-Nov-2001  lukem add RCSIDs
 1.15 18-Oct-2001  itojun simplify per-if stats.
 1.14 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.13 10-Feb-2001  itojun branches: 1.13.2; 1.13.4;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.12 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.11 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.10 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.9 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.8 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.7 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.6 31-Jul-1999  itojun branches: 1.6.2; 1.6.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.5 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file mld6.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file mld6.c was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.6.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.6.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.13.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.13.2.4 11-Jan-2002  nathanw More catchup.
 1.13.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.13.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.13.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.18.10.1 02-Oct-2003  tron Pull up revision 1.22 via patch (requested by itojun in ticket #1491):
- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
part of advanced API update (RFC2292 -> 3542).
 1.18.8.1 20-Jun-2002  gehenna catch up with -current.
 1.22.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.26.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.26.8.1 29-Apr-2005  kent sync with -current
 1.27.4.5 17-Mar-2008  yamt sync with head.
 1.27.4.4 27-Oct-2007  yamt sync with head.
 1.27.4.3 03-Sep-2007  yamt sync with head.
 1.27.4.2 30-Dec-2006  yamt sync with head.
 1.27.4.1 21-Jun-2006  yamt sync with head.
 1.28.2.1 01-Feb-2006  yamt sync with head.
 1.29.4.1 22-Apr-2006  simonb Sync with head.
 1.29.2.1 09-Sep-2006  rpaulo sync with head
 1.30.2.1 13-Mar-2006  yamt sync with head.
 1.32.14.2 10-Dec-2006  yamt sync with head.
 1.32.14.1 22-Oct-2006  yamt sync with head
 1.32.12.2 12-Jan-2007  ad Sync with head.
 1.32.12.1 18-Nov-2006  ad Sync with head.
 1.36.12.1 23-Aug-2008  bouyer Pull up following revision(s) (requested by adrianp in ticket #1187):
sys/netinet6/mld6.c: revision 1.47
Fix from matt@ for malformed ICMPv6 MLD query (CVE-2008-2464).
 1.36.8.1 04-Sep-2008  skrll Sync with netbsd-4.
 1.36.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.36.2.1 23-Aug-2008  bouyer Pull up following revision(s) (requested by adrianp in ticket #1187):
sys/netinet6/mld6.c: revision 1.47
Fix from matt@ for malformed ICMPv6 MLD query (CVE-2008-2464).
 1.37.4.1 11-Jul-2007  mjf Sync with head.
 1.37.2.4 23-Oct-2007  ad Sync with head.
 1.37.2.3 09-Oct-2007  ad Sync with head.
 1.37.2.2 01-Jul-2007  ad Adapt to callout API change.
 1.37.2.1 08-Jun-2007  ad Sync with head.
 1.39.8.2 23-Mar-2008  matt sync with HEAD
 1.39.8.1 06-Nov-2007  matt sync with HEAD
 1.39.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.39.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.39.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.40.2.1 18-Oct-2007  yamt sync with head.
 1.41.18.4 17-Jan-2009  mjf Sync with HEAD.
 1.41.18.3 28-Sep-2008  mjf Sync with HEAD.
 1.41.18.2 02-Jun-2008  mjf Sync with HEAD.
 1.41.18.1 03-Apr-2008  mjf Sync with HEAD.
 1.41.14.1 24-Mar-2008  keiichi sync with head.
 1.44.2.2 04-Jun-2008  yamt sync with head
 1.44.2.1 18-May-2008  yamt sync with head.
 1.45.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.45.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.45.2.2 19-Aug-2009  yamt sync with head.
 1.45.2.1 04-May-2009  yamt sync with head.
 1.46.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.46.4.1 19-Oct-2008  haad Sync with HEAD.
 1.47.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.47.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.48.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.51.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.51.4.1 31-May-2011  rmind sync with head
 1.54.2.1 17-Apr-2012  yamt sync with head
 1.55.22.1 10-Aug-2014  tls Rebase.
 1.55.8.2 03-Dec-2017  jdolecek update from HEAD
 1.55.8.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.55.4.1 29-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #1224):
sys/netinet6/mld6.c: revision 1.61
Ensure callout isn't running and pending before callout_destroy
Call callout_halt before callout_destroy. And also let callout (mld_timeo)
not call callout_schedule when we already called callout_halt.
This fixes PR 47881.
 1.59.2.3 18-Nov-2015  msaitoh Pull up following revision(s) (requested by joerg in ticket #1035):
sys/netinet6/mld6.c: revision 1.64
Ensure that the callout of the multicast address is valid before
hooking it up.
 1.59.2.2 23-Jan-2015  martin branches: 1.59.2.2.2;
Pull up following revision(s) (requested by pettai in ticket #441):
sys/netinet6/ip6_var.h: revision 1.64
sys/netinet6/in6.h: revision 1.82
sys/netinet6/in6_src.c: revision 1.56
sys/netinet6/mld6.c: revision 1.62
sys/netinet6/ip6_input.c: revision 1.150
sys/netinet6/ip6_output.c: revision 1.161
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.
Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.59.2.1 29-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #360):
sys/netinet6/mld6.c: revision 1.61
Ensure callout isn't running and pending before callout_destroy
Call callout_halt before callout_destroy. And also let callout (mld_timeo)
not call callout_schedule when we already called callout_halt.
This fixes PR 47881.
 1.59.2.2.2.1 18-Nov-2015  msaitoh Pull up following revision(s) (requested by joerg in ticket #1035):
sys/netinet6/mld6.c: revision 1.64
Ensure that the callout of the multicast address is valid before
hooking it up.
 1.61.2.8 28-Aug-2017  skrll Sync with HEAD
 1.61.2.7 05-Feb-2017  skrll Sync with HEAD
 1.61.2.6 05-Dec-2016  skrll Sync with HEAD
 1.61.2.5 05-Oct-2016  skrll Sync with HEAD
 1.61.2.4 09-Jul-2016  skrll Sync with HEAD
 1.61.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.61.2.2 22-Sep-2015  skrll Sync with HEAD
 1.61.2.1 06-Apr-2015  skrll Sync with HEAD
 1.72.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.72.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.72.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.72.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.77.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.88.4.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.89.2.2 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #842):

sys/netinet6/mld6.c: revision 1.93-1.99
sys/netinet6/in6_var.h: revision 1.99,1.100
sys/netinet6/in6.c: revision 1.267,1.268
sys/netinet6/nd6.c: revision 1.249

Don't hold softnet_lock in mld_timeo
Then we can get rid of remaining abuses of mutex_owned(softnet_lock).

Release in6_multilock on callout_halt of mld_timeo to avoid a deadlock
Improve atomicity of in6_leavegroup and in6_delmulti

Avoid NULL pointer dereference on imm->i6mm_maddr

Make a refcount decrement and a removal from a list of an item atomic
in6m_refcount of an in6m can be incremented if the in6m is on the list
(if_multiaddrs) in in6_addmulti or mld_input. So we must avoid such an
increment when we try to destroy an in6m. To this end we must make
an in6m_refcount decrement and a removal of an in6m from if_multiaddrs
atomic.

Make a deletion of in6m in nd6_rtrequest atomic

Move LIST_REMOVE
mld_stoptimer releases in6_multilock temporarily, so we must LIST_REMOVE first.

Avoid double LIST_REMOVE which corrupts lists
Mark in6m as used for non-DIAGNOSTIC builds.
 1.89.2.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.91.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.91.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.91.2.1 02-May-2018  pgoyette Synch with HEAD
 1.99.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.99.2.1 10-Jun-2019  christos Sync with HEAD
 1.101.32.1 02-Aug-2025  perseant Sync with HEAD
 1.101.26.1 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.10 19-Nov-2011  tls First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.9 01-Nov-2007  dyoung branches: 1.9.54;
De-__P().
 1.8 05-Mar-2006  rpaulo branches: 1.8.36; 1.8.38; 1.8.42;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.7 10-Dec-2005  elad branches: 1.7.4; 1.7.6; 1.7.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.6 06-Jun-2003  itojun branches: 1.6.2; 1.6.18;
- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).
 1.5 28-May-2002  itojun use arc4random
 1.4 10-Feb-2001  itojun branches: 1.4.2; 1.4.4; 1.4.16;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.3 03-Jul-1999  thorpej branches: 1.3.2;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file mld6_var.h was initially added on branch kame.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file mld6_var.h was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.3.2.1 11-Feb-2001  bouyer Sync with HEAD.
 1.4.16.1 30-May-2002  gehenna Catch up with -current.
 1.4.4.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.4.2.1 20-Jun-2002  nathanw Catch up to -current.
 1.6.18.2 15-Nov-2007  yamt sync with head.
 1.6.18.1 21-Jun-2006  yamt sync with head.
 1.6.2.1 11-Dec-2005  christos Sync with head.
 1.7.8.1 13-Mar-2006  yamt sync with head.
 1.7.6.1 22-Apr-2006  simonb Sync with head.
 1.7.4.1 09-Sep-2006  rpaulo sync with head
 1.8.42.1 13-Nov-2007  bouyer Sync with HEAD
 1.8.38.1 06-Nov-2007  matt sync with HEAD
 1.8.36.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.9.54.1 17-Apr-2012  yamt sync with head
 1.284 05-Jun-2025  ozaki-r Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.283 31-Mar-2025  ozaki-r nd6: send packets through the fast path even if DELAY and PROBE

If there is a valid ND cache, we can send packets for the destination
of the cache. If the state of the cache is STALE, we need to go
through the slow path to change its state. In the other cases
including the DELAY and PROBE states, we can send packets through
the fast path.
 1.282 11-Apr-2024  knakahara branches: 1.282.2;
Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.

The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.

XXX pullup-9, 10
 1.281 09-Dec-2023  pgoyette Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

XXX pullup-10 - hopefully before RC2
 1.280 11-Oct-2023  msaitoh s/Neighour/Neighbor/ in comment. No functional change.
 1.279 01-Sep-2022  riastradh branches: 1.279.4;
nd6: Take ifnet psref around cprng_fast in nd6_slowtimo.

This may sleep on an adpative mutex, the global entropy lock, so
pserialize is forbidden.
 1.278 31-Dec-2021  andvar s/quetion/question/
 1.277 17-Aug-2021  ozaki-r nd6: prevent ln from being freed while releasing held packets
 1.276 28-Dec-2020  nia Add more guards against NULL deref, since KUBSAN still complains.
 1.275 26-Dec-2020  nia Avoid NULL pointer dereference, noticed by KUBSAN.

"Looks fine" roy@
 1.274 15-Sep-2020  roy branches: 1.274.2;
Implement RFC 7048, making Neighbor Unreachability Detection less impatient

RFC 7048 Section 3 says in the UNREACHABLE state packets continue to be
sent to the link-layer address and then backoff exponentially.
We adjust this slightly and move to the INCOMPLETE state after
`nd_mmaxtries` probes and then start backing off.

This results in simpler code whilst providing a more robust model which
doubles the time to failure over what we did before.
We don't want to be back to the old ARP model where no unreachability
errors are returned because very few applications would look at
unreachability hints provided such as ND_LLINFO_UNREACHABLE or RTM_MISS.
 1.273 14-Sep-2020  roy nd: Name l3addr union of llentry and use in-place of nd_addr.

Probably makes more sense and makes nd.h less messy.
 1.272 11-Sep-2020  roy inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.
 1.271 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.270 28-Apr-2020  roy inet6: Ensure that route MTU is guarded by ARC_PHDS_MAXMTU

This mirrors the ARP behavior for ARCnet interfaces based on current
kernel RA handling.
 1.269 12-Apr-2020  roy nd6: RTM_MISS reports RTA_AUTHOR once more

Just moves the logic to send RTM_MISS after the ICMP6 report as we
rely on that function to extract the requesting address.

Fixes PR kern/55164.
 1.268 03-Apr-2020  christos branches: 1.268.2;
PR/55030: Avoid locking against myself panic by moving the icmp error outside
the lock. Thanks ozaki-r!
 1.267 09-Mar-2020  roy route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.266 20-Jan-2020  thorpej Remove FDDI support.
 1.265 25-Sep-2019  ozaki-r branches: 1.265.2;
Make panic messages more informative
 1.264 25-Sep-2019  ozaki-r Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.263 01-Sep-2019  roy inet6: Re-introduce ND6_LLINFO_WAITDELETE so we can return EHOSTDOWN

Once we've sent nd6_mmaxtries NS messages, send RTM_MISS and move to the
ND6_LLINFO_WAITDELETE state rather than freeing the llentry right away.
Wait for a probe cycle and then free the llentry.

If a connection attempts to re-use the llentry during ND6_LLINFO_WAITDELETE,
return EHOSTDOWN (or EHOSTUNREACH if a gateway) to match inet behaviour.
Continue to ND6_LLINFO_INCOMPLETE and send another NS probe in hope of a
reply. Rinse and repeat.

This reverts part of nd6.c r1.14 - an 18 year old commit!
 1.262 01-Sep-2019  roy inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.
This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted (RTM_DELETED)
or has failed to been resolved (RTM_MISS). The latter case can be
interpreted as unreachable.
 1.261 31-Aug-2019  roy inet6: don't set an invalid lladdr in nd6_free()

We don't want to announce that we've deleted a hwaddr of all zeros.
 1.260 27-Aug-2019  roy inet6: nd6_free assumes all routers are processed by kernel RA

This hasn't been the case for a long time if you're a dhcpcd
user with a default config. As such, it's possible for the default
IPv6 router as set by dhcpcd could be erroneously gc'ed by nd6_free.

This reduces the scope of the ND6_WLOCK taken as well as fixing an
issue where we write to ln->ln_state without a lock being held.
 1.259 22-Aug-2019  roy nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.258 22-Aug-2019  roy rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9
 1.257 14-Aug-2019  ozaki-r Add missing IFNET_LOCK for regen_tmpaddr

Reported by ryo@
 1.256 26-Jul-2019  christos branches: 1.256.2;
Decrease the reference count before freeing, so that the entries actually
get free'd. (Ryota Ozaki)
 1.255 28-Jun-2019  ozaki-r nd6: restore a missing reachability confirmation

On sending a packet over a STALE cache, the cache should be tried a reachability
confirmation, which is described in RFC 2461/4861 7.3.3. On the fast path in
nd6_resolve, however, the treatment for STALE caches has been skipped
accidentally. So STALE caches never be back to the REACHABLE state.

To fix the issue, branch to the fast path only when the cache entry is the
REACHABLE state and leave other caches to the slow path that includes the
treatment. To this end we need to allow to return a link-layer address if a
valid address is available on the slow path too, which is the same behavior as
FreeBSD and OpenBSD.
 1.254 13-May-2019  christos print the name of the interface that was disabled.
 1.253 29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.252 16-Dec-2018  roy netinet6: only flush prefixes and routers for the given interface.

Unless it's lo0, where we then flush the lot.
The maintains the status-quo with ndp(8) and allows dhcpcd(8) to at least
try and work with kernel RA on one interface and dhcpcd on another.
 1.251 30-Oct-2018  ozaki-r Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
 1.250 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.249 29-May-2018  ozaki-r branches: 1.249.2;
Make a deletion of in6m in nd6_rtrequest atomic
 1.248 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.247 06-Mar-2018  roy nd6: add a nonce to DaD probes in-case they are looped back to us

This implements RFC 7527, based a similar change in FreeBSD.
 1.246 06-Mar-2018  ozaki-r Fix reference leaks of llentry

callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).

While here, we can remove remaining abuses of mutex_owned for softnet_lock.
 1.245 29-Jan-2018  christos branches: 1.245.2;
more cleanup (don't allow oldlenp == NULL)
 1.244 29-Jan-2018  pgoyette One more from christos@

No need to initialize fill_func
 1.243 29-Jan-2018  pgoyette More simplification, this time from ozaki-r@

No need to break after return.
 1.242 29-Jan-2018  pgoyette Simplify, from christos@
 1.241 29-Jan-2018  pgoyette Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.

Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.

Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.

As discussed on IRC.
OK kamil@ and christos@

XXX Needs pull-up to netbsd-8 branch.
 1.240 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.239 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.238 10-Nov-2017  ozaki-r Use psref instead of pserialize because that code is sleepable
 1.237 10-Nov-2017  ozaki-r Fix a deadlock between a route update and lltable

It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.

If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.

A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update

Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.

Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.236 05-Oct-2017  ozaki-r Add missing NULL check

PR kern/52554
 1.235 22-Jun-2017  ozaki-r Remove unused function (nd6_rem_ifa_lle)
 1.234 21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.233 16-Jun-2017  ozaki-r Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@
 1.232 01-Jun-2017  chs branches: 1.232.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.231 01-Mar-2017  ozaki-r Restore/add some softnet_lock for nd6_rt_flush and defrouter_addreq

May help PR kern/52015
 1.230 22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.229 22-Feb-2017  ozaki-r Use kmem istead of malloc
 1.228 22-Feb-2017  ozaki-r Fix prefix invalidation via nd6_timer

We cannot remove a prefix there. Instead just invalidate it; the prefix
will be removed when purging an associated address. This is the same as
the original behavior.
 1.227 14-Feb-2017  ozaki-r Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.226 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.225 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.224 11-Jan-2017  ozaki-r branches: 1.224.2;
Get rid of unnecessary header inclusions
 1.223 22-Dec-2016  ozaki-r Remove assertion that the lock isn't held

It's useless in this case, because without it we can know that
the lock is held or not on a next lock acquisition and even more
if LOCKDEBUG is enabled a failure on the acquisition will provide
useful information for debugging while an assertion failure will
provide just the fact that the assertion failed.
 1.222 21-Dec-2016  ozaki-r Fix deadlock between llentry timers and destruction of llentry

llentry timer (of nd6) holds both llentry's lock and softnet_lock.
A caller also holds them and calls callout_halt to wait for the
timer to quit. However we can pass only one lock to callout_halt,
so passing either of them can cause a deadlock. Fix it by avoid
calling callout_halt without holding llentry's lock.

BTW in the first place we cannot pass llentry's lock to callout_halt
because it's a rwlock...
 1.221 21-Dec-2016  ozaki-r Hold the big locks only where they are needed
 1.220 19-Dec-2016  ozaki-r Protect IPv6 default router and prefix lists with coarse-grained rwlock

in6_purgeaddr (in6_unlink_ifa) itself unrefernces a prefix entry and calls
nd6_prelist_remove if the counter becomes 0, so callers doesn't need to
handle the reference counting.

Performance-sensitive paths (sending/forwarding packets) call just one
reader lock. This is a trade-off between performance impact vs. the amount
of efforts; if we want to remove the reader lock, we need huge amount of
works including destroying objects with psz/psref in softint, for example.
 1.219 19-Dec-2016  ozaki-r Kill pr->ndpr_refcnt = 0

The reference counter represents the numuber of references from IPv6
addresses to a prefix entry. If all IPv6 addresses assigned to an
interface are purged, all references to a prefix for the interface are
also released. For now nd6_purge is always called after purging all IPv6
addresses, so we can get rid of clearing pr->ndpr_refcnt from nd6_purge
and instead we can assert it's 0 there.

Note that nd6_ifdetach is only called via dom_ifdetach when processing
if_detach where dom_ifdetach is called after pr_purgeif that eventually
calls in6_ifdetach. So in the call path nd6_purge in nd6_ifdetach does
nothing. That said, we should explicitly make it sure to purge all
IPv6 addresses before nd6_purge for future changes (or the case I missed
something). So if_purgeaddrs is added to nd6_ifdetach.
 1.218 19-Dec-2016  ozaki-r Get rid of extra nd6_purge from in6_ifdetach

There were two nd6_purge in in6_ifdetach for some reason, but at least now
We don't need extra nd6_purge. Remove it and instead add assertions that
check if surely purged.
 1.217 14-Dec-2016  ozaki-r Make functions static
 1.216 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.215 12-Dec-2016  ozaki-r Introduce macros for the prefix list

No functional change.
 1.214 12-Dec-2016  ozaki-r Introduce macros for the default router list

No functional change.
 1.213 11-Dec-2016  ozaki-r Add nd6_ prefix to exported functions
 1.212 11-Dec-2016  ozaki-r Move default interface things from nd6_rtr.c to nd6.c
 1.211 14-Nov-2016  ozaki-r Add missing rtfree
 1.210 02-Nov-2016  ozaki-r Add missing pserialize_read_exit
 1.209 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.208 18-Oct-2016  ozaki-r Fix indentation
 1.207 02-Sep-2016  ozaki-r Don't GC an NDP cache that is added just before GC

This fixes unstable test results of ndp_neighborgcthresh.
 1.206 06-Aug-2016  roy Set RTF_CONNECTED instead of setting only RTF_CONNECTED.
 1.205 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.204 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.203 11-Jul-2016  ozaki-r branches: 1.203.2;
Run timers in workqueue

Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).

Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.

Proposed on tech-net and tech-kern.
 1.202 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.201 05-Jul-2016  ozaki-r Constify an argument of regen_tmpaddr
 1.200 05-Jul-2016  ozaki-r KNF
 1.199 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.198 30-Jun-2016  ozaki-r Make sure that ifaddr is published after its initialization finished

Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
 1.197 21-Jun-2016  ozaki-r Fix nd6_output (if_output_lock conversion mistake)
 1.196 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.195 18-May-2016  ozaki-r Get rid of unnecessary assignment
 1.194 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.193 26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.192 25-Apr-2016  ozaki-r Check error of rt_setgate and rt_settag
 1.191 21-Apr-2016  ozaki-r Fix RTF_{REJECT,BLACKHOLE} behavior for IPv6 routes

We still need a nexthop route to reflect RTF_{REJECT,BLACKHOLE}.
In the future, we would do it w/o looking up a route.
 1.190 10-Apr-2016  ozaki-r Don't call pfxlist_onlink_check with holding llentry lock

Sync nd6_free with FreeBSD (as of 2016-04-10).

Should fix PR kern/51056.
 1.189 04-Apr-2016  roy all1_sa is no longer used.
 1.188 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.187 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.186 01-Apr-2016  ozaki-r Tidy up nd6_timer initialization
 1.185 04-Feb-2016  riastradh Declare in6_tmpaddrtimer_ch in in6_var.h.

Do not declare extern variables in .c files!
 1.184 08-Jan-2016  ozaki-r Add missing RTF_LOCAL; sync with arp_setgate
 1.183 18-Dec-2015  ozaki-r Add missing LLE_WUNLOCK to nd6_free
 1.182 07-Dec-2015  ozaki-r CID 1341546: Fix integer handling issue (CONSTANT_EXPRESSION_RESULT)

n > INT_MAX where n is a long integer variable never be true on 32bit
architectures. Use time_t(int64_t) instead of long for the variable.
 1.181 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.180 19-Nov-2015  ozaki-r Call icmp6_error2 after releasing ln

This is a restructuring for coming changes.

From FreeBSD
 1.179 18-Nov-2015  ozaki-r Stop passing llinfo_nd6 to nd6_ns_output

This is a restructuring for coming changes to nd6 (replacing
llinfo_nd6 with llentry). Once we have a lock of llinfo_nd6,
we need to pass it to nd6_ns_output with holding the lock.
However, in a function subsequent to nd6_ns_output, the llinfo_nd6
may be looked up, i.e., its lock would be acquired again.
To avoid such a situation, pass only required data (in6_addr) to
nd6_ns_output instead of passing whole llinfo_nd6.

Inspired by FreeBSD
 1.178 18-Nov-2015  ozaki-r Unify nd6_ns_output calls in nd6_llinfo_timer

Inspired by FreeBSD
 1.177 11-Sep-2015  roy If, for whatever reason, a local interface route is removed and then
re-added, mark it as a local route.

While here, if changing the route to go via the loopback interface
remove any inherited MTU value.
 1.176 04-Sep-2015  ozaki-r Pull nexthop determination routine from nd6_output

It simplifies nd6_output and the nexthop determination routine slightly.
 1.175 03-Sep-2015  ozaki-r Fix rtfree in nd6_output

We have to check and avoid to rtfree the original rtentry passed to
nd6_output even when manipulating gateway routes.

This fixes panic on assertion "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0"
failure and probably PR kern/50161.
 1.174 02-Sep-2015  ozaki-r Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.
 1.173 02-Sep-2015  ozaki-r Use KASSERT to check programming errors
 1.172 01-Sep-2015  ozaki-r Move a rtentry definition to reduce its scope

No functional change.
 1.171 01-Sep-2015  ozaki-r Cleanup nd6_nud_hint

The deleted rtfree was never called.
 1.170 31-Aug-2015  ozaki-r Remove leading whitespaces
 1.169 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.168 11-Aug-2015  ozaki-r Fix double rtfree
 1.167 11-Aug-2015  ozaki-r Free rtentry when we successfully obtain it but return NULL
 1.166 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.165 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.164 15-Jul-2015  ozaki-r Remove unused arguments and the associated code from nd6_nud_hint()

from OpenBSD
 1.163 30-Jun-2015  ozaki-r Use KASSERT for argument NULL checks
 1.162 30-Apr-2015  ozaki-r Don't take KERNEL_LOCK for if_output when NET_MPSAFE
 1.161 30-Mar-2015  ozaki-r Tidy up opt_ipsec.h inclusions
 1.160 25-Feb-2015  roy Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.
 1.159 25-Feb-2015  roy Retire nd6_newaddrmsg and use rt_newaddrmsg directly instead so that
we don't spam route changes when the route hasn't changed.
 1.158 23-Feb-2015  martin Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.157 17-Feb-2015  christos "something odd happens" is not a useful error message.
 1.156 16-Dec-2014  roy Report route additions/changes/deletions for cached neighbours to userland.
 1.155 03-Dec-2014  christos more debugging info...
 1.154 18-Oct-2014  snj branches: 1.154.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.153 14-Oct-2014  roy Tests for neighbour now work correctly on bridge(4) and carp(4) interfaces.
 1.152 06-Jun-2014  rmind branches: 1.152.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.151 05-Jun-2014  roy Add IPV6CTL_AUTO_LINKLOCAL and ND6_IFF_AUTO_LINKLOCAL toggles which
control the automatic creation of IPv6 link-local addresses when an
interface is brought up.

Taken from FreeBSD.
 1.150 20-May-2014  bouyer Sync with the ipv4 code and call ifp->if_output() with KERNEL_LOCK
held.
Problem reported and fix tested by njoly@ on current-users@
 1.149 17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.148 20-Mar-2014  roy branches: 1.148.2;
If IPv6 is disabled for an interface, mark all addresses as tentative.
If enabled, check for a duplicated link-local address and abort enabling
as per RFC 4862, section 5.4.5. If allowed to enable, perform DAD
on the tentative addresses.

Taken from FreeBSD.
 1.147 15-Jan-2014  roy If the address matches a cloning route, it is also a neighbor.
This allows us to use prefixes which userland may have added.
 1.146 17-Dec-2013  martin Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
 1.145 21-May-2013  roy branches: 1.145.2;
For IPv6, emit RTM_NEWADDR once DAD completes and also when address flag
changes. Tentative addresses are not emitted.

Version bumped so userland can detect this behaviour change.
 1.144 24-Jan-2013  joerg Use rt_getkey.
 1.143 23-Jun-2012  christos branches: 1.143.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.142 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.141 03-Feb-2012  christos branches: 1.141.2; 1.141.6; 1.141.8;
PR/45764, PR/45914
Part 1:
nd6_purge can be called after dom_ifdetach, and if_afdata[AF_INET6] is
going to be freed and point to garbage. Make sure we check for NULL, before
taking the pointer offset.
While I am here, add an M_ZERO.
 1.140 02-Feb-2012  christos use FOREACH_SAFE.
 1.139 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.138 19-Nov-2011  tls branches: 1.138.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.137 10-Nov-2011  seanb - Remove unused variable from nd6_timer().
 1.136 15-Jul-2010  dyoung branches: 1.136.8;
To help find the cause of kernel complaints such as "/netbsd:
nd6_storelladdr: sdl_alen == 0, dst=... if=wm1", add printfs for some
"impossible" conditions, and make the nd6_storelladdr() printf more
informative by printing the value of sdl_alen.
 1.135 06-Nov-2009  dyoung branches: 1.135.2; 1.135.4;
Fix net.inet6.ip6.accept_rtadv and 'ndp -i <interface> accept_rtadv':

Add a flag ND6_IFF_OVERRIDE_RTADV that tells the kernel to override
ip6_accept_rtadv (net.inet6.ip6.accept_rtadv) on an interface.

Add a routine nd6_accepts_rtadv(ndi) that evaluates both the flags
on the interface represented by ndi and ip6_accept_rtadv, and
returns 'true' if the given interface should accept Router
Advertisements, and 'false' if not.

Now, ND6_IFF_ACCEPT_RTADV works as it was historically documented:
if it is set, then accept router advertisements iff ip6_accept_rtadv
!= 0. Otherwise, do not accept router advertisements.

If ND6_IFF_OVERRIDE_RTADV is set, then the flag ND6_IFF_ACCEPT_RTADV
overrides ip6_accept_rtadv: if ND6_IFF_ACCEPT_RTADV is set, accept;
otherwise reject. Ignore ip6_accept_rtadv.

If neither ND6_IFF_ACCEPT_RTADV nor ND6_IFF_OVERRIDE_RTADV is set,
reject Router Advertisements.
 1.134 31-Aug-2009  yamt nd6_ifattach: fix a missing parens bug in rev.1.132.
 1.133 06-Aug-2009  cegger Check if ndi is valid before use.
ok tonnerre@
 1.132 25-Jul-2009  tonnerre Instead of using the net.inet6.ip6.accept_rtadv sysctl for all devices,
make net.inet6.ip6.accept_rtadv the default for individual per-device
settings so people can use the ndp(8) utility to set per-device whether
or not to accept route advertisements.

rtadvd changes to follow.

(Debated on tech-net@ before but almost two weeks passed by without any
comment on the patch.)
 1.131 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.130 24-Oct-2008  dyoung branches: 1.130.2; 1.130.4; 1.130.10; 1.130.14;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.129 24-Oct-2008  dyoung bzero -> memset. Do not "test truth" of pointers, but compare with
NULL, instead. Do not gratuitously cast to void *. Use NULL
instead of (type *)0.

No functional changes intended.
 1.128 15-May-2008  dyoung branches: 1.128.4;
Simplify RT_DPRINTF() calls.
 1.127 11-May-2008  dyoung Compare route with NULL instead of testing truth. Where applicable,
s/0/NULL/. s/u_char/uint8_t/. Remove superfluous curly braces.
 1.126 24-Apr-2008  ad branches: 1.126.2; 1.126.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.125 15-Apr-2008  thorpej branches: 1.125.2;
Make ip6 and icmp6 stats per-cpu.
 1.124 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.123 04-Dec-2007  dyoung branches: 1.123.8; 1.123.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.122 10-Nov-2007  dyoung branches: 1.122.2;
Use sockaddr_in6_init(). Use a static initializer for all1_sa.
Constify a cast (may as well). No functional change intended.
 1.121 01-Nov-2007  dyoung branches: 1.121.2;
De-__P().
 1.120 02-Sep-2007  dyoung branches: 1.120.4;
We cannot sleep in a software interrupt, so do not sockaddr_dl_alloc(...,
M_WAITOK). Instead, sockaddr_dl_init() a sockaddr_dl on the stack.
 1.119 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.118 07-Aug-2007  dyoung branches: 1.118.2; 1.118.4;
Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.117 19-Jul-2007  dyoung branches: 1.117.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.116 09-Jul-2007  ad branches: 1.116.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.115 17-May-2007  dyoung Fix the memory leak reported in kern/36337. Thanks Matthias Scheler
for the heads-up. My fix is based on the following patches from
FreeBSD, however, I extracted the code into a subroutine,
nd6_llinfo_release_pkts():

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6.c.diff?r1=1.48.2.18;r2=1.48.2.19
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6_nbr.c.diff?r1=1.29.2.8;r2=1.29.2.9
 1.114 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.113 17-Mar-2007  dyoung In nd6_rtrequest(), when we lookup/create a route whose destination
is equal to one of the host's IPv6 addresses, do not stop at setting
the route's interface to lo0, but also clear the route's RTF_CLONED
flag, if it is present, so that ip6_input() will accept packets
sent to that destination. This is necessary because ip6_input()
will not accept a packet if it looks up the packet's destination
and finds a route with RTF_CLONED set.

I believe this will help IPv6 networking survive '/etc/rc.d/network
restart'. See the problem report, kern/33279.
 1.112 15-Mar-2007  dyoung In nd6_lookup, shorten a staircase. KNF: change return (expr); to
return expr; throughout. Fix K&R prototypes and parameter type
declarations.
 1.111 04-Mar-2007  christos branches: 1.111.2; 1.111.4; 1.111.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.110 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.109 24-Nov-2006  christos branches: 1.109.4;
fix spelling of accommodate; from Zapher.
 1.108 20-Nov-2006  dyoung Use LIST_/TAILQ_ macros, esp. LIST_FOREACH() and TAILQ_FOREACH().
Use the usual idiom for iterating over a list where we might
_REMOVE() entries,

for (x = TAILQ_FIRST(...); x != NULL; x = nx) {
nx = TAILQ_NEXT(x, ...);
...
}
 1.107 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.106 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.105 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.104 02-Sep-2006  christos branches: 1.104.2; 1.104.4;
- fix initializers
- add const
- remove dead code
 1.103 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.102 18-May-2006  liamjfoy branches: 1.102.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.101 15-Apr-2006  christos Coverity CID 857: Prevent NULL deref.
 1.100 24-Mar-2006  rpaulo From KAME via SUZUKI Shinsuke:
fixed a memory leak when net.inet6.icmp6.nd6_maxqueuelen is
greater than 1.
 1.99 05-Mar-2006  rpaulo branches: 1.99.2; 1.99.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.98 03-Mar-2006  rpaulo branches: 1.98.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.97 02-Mar-2006  dyoung In nd6_llinfo_timer, don't duplicate part of nd6_llinfo_settimer's
logic, and then call nd6_llinfo_settimer. Instead, call
nd6_llinfo_settimer immediately.

This should cause no functional change. I've been running this
patch for months.
 1.96 21-Jan-2006  rpaulo branches: 1.96.2; 1.96.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.95 11-Dec-2005  christos branches: 1.95.2;
merge ktrace-lwp.
 1.94 29-May-2005  christos branches: 1.94.2;
- avoid shadowed variables
- sprinkle const.
 1.93 27-May-2005  seanb - Arithmetic error when calculating ticks to nd6_llinfo_settimer().
- Reviewed by christos.
 1.92 03-Apr-2005  tron Make sure that prefixes get purged. This fixes PR kern/21189,
PR kern/25968 and PR kern/27873.
 1.91 04-Dec-2004  peter branches: 1.91.4; 1.91.10;
Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.90 19-May-2004  itojun do not loop on nd6_output() when transmission fails. from kame
 1.89 11-Feb-2004  itojun branches: 1.89.2; 1.89.4;
avoid ugly typecast
 1.88 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.87 22-Aug-2003  itojun correct missing inclusion of opt_ipsec.h
 1.86 27-Jun-2003  itojun branches: 1.86.2;
split ND6 cache timer management to per-entry. increased accuracy,
no O(N) loop. sync w/ kame
 1.85 24-Jun-2003  itojun remove unneeded checks of accept_rtadv. from kame
 1.84 24-Jun-2003  itojun * kame/sys/netinet6/nd6.c (nd6_rtrequest): changed a condition to
decide whether to create an empty llinfo stricter so that a user
can manually change the link-layer address of an existing neighbor
cache.
Pointed out by: KIU Shueng Chuan

from kame
 1.83 24-Jun-2003  itojun use time.tv_sec directly
 1.82 24-Jun-2003  itojun clear ln_hold earlier. from kame
 1.81 04-May-2003  christos print how big the mtu needs to be for ipv6 ppp.
 1.80 25-Feb-2003  he Make sure to initialize callout structs.
 1.79 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.78 17-Jan-2003  itojun switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.77 09-Oct-2002  itojun suppress too noisy log by default (can be re-enabled by sysctl). sync w/kame
 1.76 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.75 23-Sep-2002  itojun better fix to PR 18163 ("deprecated" flag manipulation). sync w/kame
 1.74 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.73 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.72 04-Sep-2002  itojun allow "deprecated" bit to be manually set. PR 18163
 1.71 19-Aug-2002  itojun check error from copyout
 1.70 19-Aug-2002  itojun typo in comment
 1.69 19-Aug-2002  itojun fix copyout() logic. more proper fix to be done on kame tree.
 1.68 19-Aug-2002  itojun copyout only if oldp is non-null
 1.67 19-Aug-2002  itojun need explicit copyout(), apparently
 1.66 09-Jun-2002  itojun whitespace cleanup
 1.65 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.64 07-Jun-2002  itojun If there has been no NS for the neighbor after entering the
INCOMPLETE state, send the first solicitation in nd6_output(), regardless
of the timer value.
revised comments about rate-limiting accordingly.

sync w/kame
 1.63 03-Jun-2002  itojun whitespace at EOL
 1.62 03-Jun-2002  itojun do not hardcode if_mtu values in here, except for IFT_{ARC,FDDI} -
they need special handling. makes it possible to take advantage of 9k ether
frames.
 1.61 30-May-2002  itojun improve nd6_setmtu(), to warn too-small MTU on SIOCSIFMTU. sync w/kame
 1.60 29-May-2002  itojun missing bzero
 1.59 29-May-2002  itojun receivedra field is gone
 1.58 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.57 20-Mar-2002  itojun branches: 1.57.4; 1.57.6;
remove obsolete comment
 1.56 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.55 13-Nov-2001  lukem add RCSIDs
 1.54 17-Oct-2001  itojun do not change neighbor cache state on entry timeout,
if the cache entry is for outgoing router.

perform on-linkness check before default router (re-)seletion.

do not play with interface direct route on nd6_rtrequest.

sync a lot of cosmetic changes. sync with kame
 1.53 17-Oct-2001  itojun unifdef OLDIP6OUTPUT
 1.52 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.51 25-Jul-2001  itojun ifidex2ifnet could contain NULL after if_detach(). sync with kame
 1.50 20-Jul-2001  itojun sync rt_ifp check with IPv4 counterpart (see sys/net/if_ethersubr.c 1.27).
sync with kame
 1.49 29-Jun-2001  itojun branches: 1.49.2;
call defrouter_select() only if it is autoconfigured host.
 1.48 27-Jun-2001  itojun refresh default router list on nd6_detach(), only if we are an
autoconfigured host. bug was that, we will lose default route on
"ifconfig gif0 destroy" even if default is not pointing to gif0.
reported by ume@mahoroba.org. sync with kame
 1.47 22-Jun-2001  itojun select default router again, when L2 address of the router changes
 1.46 24-May-2001  itojun print more diag message on in6_addmulti() failures.
 1.45 30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.44 21-Mar-2001  itojun in nd6_cache_lladdr(), set nd6_gctimer to ln_expire just after the state
transition to STALE. fixes tahi test breakage. sync with kame.
 1.43 08-Mar-2001  itojun nd6_storelladdr() was not consistent about m_freem() policy.
do not touch RTF_STATIC entries (static ND entries) on ND cache update.
couple of costmetic sync. sync with kame
 1.42 23-Feb-2001  itojun branches: 1.42.2;
garbage-collect stale ND entries (default: 1 day).
RFC 2461 5.3. sync with kame.
 1.41 23-Feb-2001  itojun remove unnecessary state, ND6_LLINFO_WAITDELETE, from neighbor cache
state machine.
no need for RTF_REJECT on neighbor cache entires, they are leftover from
ARP code.
sync with kame.
 1.40 21-Feb-2001  itojun make validation code more strict for ND6/dest6 variable length headers.
check duplicated nd6_ifinfo table initialization in a better way.
sync with kame
 1.39 21-Feb-2001  itojun style, to make kame sync easier
 1.38 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.37 08-Feb-2001  itojun when chasing nd6_llinfo chain, make sure we do not touch dangling
pointer (due to RTM_DELETE during default router list management).
from kame
 1.36 07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.35 05-Feb-2001  chs expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.34 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.33 05-Nov-2000  onoe First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.32 15-Oct-2000  itojun suppress warning on nd6_storelladdr failure. the failure could happen
easily when we have routing table with too many entries. sync with kame.
 1.31 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.30 19-May-2000  itojun branches: 1.30.4;
do not mistakingly forward link-local scoped packet (the bug was added
with "beyondscope" icmp6 support).
"options FAKE_LOOPBACK_IF" will honor scope on loopback outputs. rcvif will
be real interface, not the loopback, just like when multicast loopback.

(sync with kame)
 1.29 09-May-2000  itojun do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).
 1.28 27-Apr-2000  itojun temporary workaround against GIF NUD issue (when you configure globals
onto GIF, NUD prevents packet from going out)
KAME PR 245. From: Andreas Wrede <andreas@planix.com>
 1.27 19-Apr-2000  itojun add boundary check for nd6_ifinfo (otherwise ndp -i can make out-of-bound
accesses).
 1.26 16-Apr-2000  itojun perform neighbor unreachability detection on p2p links (spec requires
it for bidir p2p links).
improve -i in ndp(8) to allow tweaking per-interface ND flag on.
fix ndp(8) infinite loop on certain routing table setup.
 1.25 16-Apr-2000  itojun better sync with latest kame (cosmetic only).
 1.24 13-Apr-2000  itojun add comment on sdl_alen check (sync with kame)
 1.23 13-Apr-2000  itojun bark if sdl_alen == 0. test code for KAME PR 235.
 1.22 13-Apr-2000  itojun even if nd6_nud_hint is called, do not change a neighbor's status
unless the old status is probably reachable (i.e. the link-layer address
has already been resolved).
KAME PR 235.
 1.21 12-Apr-2000  itojun revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.20 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.19 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.18 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.17 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.16 04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.15 03-Feb-2000  itojun remove #if 0'ed code
 1.14 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.13 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.12 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.11 10-Dec-1999  itojun add missing splx(). a critical bug fix from kame.
 1.10 20-Sep-1999  itojun branches: 1.10.2; 1.10.8;
tiny fix to ARCnet IPv6 support.
- in in6_ifattach_getifid(), we can grab interface id source iff the source
is universally (worldwide) unique. ARCnet hardware address is of 8bit and
does not satisfy the condition.
(in6_ifattach_getifid() is for getting interface id usable for pseudo
interfaces like gif*)
- xx_to_eui64() should return EUI64 format, not IPv6 interface id format.
this may seem awkward so I wish to clean these things up.
- in nd6.c, change if clause into case clause to allow future addition
of IFT_xxx easier.
 1.9 19-Sep-1999  is Zeroth version of IPv6 support for ARCnet. Correct MTU handling still needs
to be done.
 1.8 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.7 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.6 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.5 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.4 03-Jul-1999  thorpej RCS ID police.
 1.3 02-Jul-1999  itojun expand insque/remque (quick hack). fundamental fix should be done
while clarifying relationship between inpcb and in6pcb.

PR: 7891
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file nd6.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file nd6.c was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.10.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.10.2.7 21-Apr-2001  bouyer Sync with HEAD
 1.10.2.6 27-Mar-2001  bouyer Sync with HEAD.
 1.10.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.10.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.10.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.10.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.10.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.4.3 09-May-2001  he Pull up revision 1.36 (requested by itojun):
Suppress ND6 logs that are too noisy for normal use. Can be
re-enabled by net.inet6.icmp6.nd6_debug.
 1.30.4.2 26-Feb-2001  he Pull up revision 1.40 (via patch, requested by itojun):
Tighten IPv6 ND6/dest6 option chasing bounds check.
 1.30.4.1 20-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
- add protection mechanism against ND cache corruption due to bad NUD hints.

this is part of:
sys/netinet/icmp6.h 1.9 -> 1.10
sys/netinet/tcp_input.c 1.111 -> 1.112
sys/netinet6/icmp6.c 1.34 -> 1.35
sys/netinet6/nd6.c 1.30 -> 1.31
sys/netinet6/nd6.h 1.14 -> 1.15
 1.42.2.12 17-Jan-2003  thorpej Sync with HEAD.
 1.42.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.42.2.10 17-Sep-2002  nathanw Catch up to -current.
 1.42.2.9 27-Aug-2002  nathanw Catch up to -current.
 1.42.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.42.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.42.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.42.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.42.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.42.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.42.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.42.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.49.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.49.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.49.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.49.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.49.2.1 03-Aug-2001  lukem update to -current
 1.57.6.1 04-Jun-2002  lukem Pull up revision 1.62 (via manual patch) (requested by itojun in ticket #145):
do not hardcode if_mtu values in here, except for IFT_{ARC,FDDI} -
they need special handling. makes it possible to take advantage of 9k ether
frames.
 1.57.4.3 29-Aug-2002  gehenna catch up with -current.
 1.57.4.2 20-Jun-2002  gehenna catch up with -current.
 1.57.4.1 30-May-2002  gehenna Catch up with -current.
 1.86.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.86.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.86.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.86.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.86.2.1 03-Aug-2004  skrll Sync with HEAD
 1.89.4.1 07-Apr-2005  he Pull up revision 1.92 (requested by tron in ticket #1394):
Make sure that prefixes get purged. Fixes PR#21189,
PR#25968, and PR#37873.
 1.89.2.1 07-Apr-2005  he Pull up revision 1.92 (requested by tron in ticket #1394):
Make sure that prefixes get purged. Fixes PR#21189,
PR#25968, and PR#37873.
 1.91.10.1 07-Apr-2005  jmc Pullup rev 1.92 (requested by tron in ticket #105)

Make sure that prefixes get purged. PR#21189, PR#25968, PR#27873
 1.91.4.1 29-Apr-2005  kent sync with -current
 1.94.2.6 07-Dec-2007  yamt sync with head
 1.94.2.5 15-Nov-2007  yamt sync with head.
 1.94.2.4 03-Sep-2007  yamt sync with head.
 1.94.2.3 26-Feb-2007  yamt sync with head.
 1.94.2.2 30-Dec-2006  yamt sync with head.
 1.94.2.1 21-Jun-2006  yamt sync with head.
 1.95.2.1 01-Feb-2006  yamt sync with head.
 1.96.4.3 01-Jun-2006  kardel Sync with head.
 1.96.4.2 22-Apr-2006  simonb Sync with head.
 1.96.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.96.2.1 09-Sep-2006  rpaulo sync with head
 1.98.2.5 03-Sep-2006  yamt sync with head.
 1.98.2.4 26-Jun-2006  yamt sync with head.
 1.98.2.3 24-May-2006  yamt sync with head.
 1.98.2.2 01-Apr-2006  yamt sync with head.
 1.98.2.1 13-Mar-2006  yamt sync with head.
 1.99.4.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.99.4.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.99.2.1 19-Apr-2006  elad sync with head.
 1.102.2.1 19-Jun-2006  chap Sync with head.
 1.104.4.2 10-Dec-2006  yamt sync with head.
 1.104.4.1 22-Oct-2006  yamt sync with head
 1.104.2.2 12-Jan-2007  ad Sync with head.
 1.104.2.1 18-Nov-2006  ad Sync with head.
 1.109.4.5 17-May-2007  yamt sync with head.
 1.109.4.4 07-May-2007  yamt sync with head.
 1.109.4.3 24-Mar-2007  yamt sync with head.
 1.109.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.109.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.111.6.1 18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.111.4.1 11-Jul-2007  mjf Sync with head.
 1.111.2.5 09-Oct-2007  ad Sync with head.
 1.111.2.4 20-Aug-2007  ad Sync with HEAD.
 1.111.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.111.2.2 08-Jun-2007  ad Sync with head.
 1.111.2.1 10-Apr-2007  ad Sync with head.
 1.116.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.116.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.117.4.5 09-Dec-2007  jmcneill Sync with HEAD.
 1.117.4.4 11-Nov-2007  joerg Sync with HEAD.
 1.117.4.3 04-Nov-2007  jmcneill Sync with HEAD.
 1.117.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.117.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.118.4.2 07-Aug-2007  dyoung Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.118.4.1 07-Aug-2007  dyoung file nd6.c was added on branch matt-mips64 on 2007-08-07 04:35:43 +0000
 1.118.2.2 09-Jan-2008  matt sync with HEAD
 1.118.2.1 06-Nov-2007  matt sync with HEAD
 1.120.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.121.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.121.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.122.2.1 08-Dec-2007  ad Sync with head.
 1.123.12.2 17-Jan-2009  mjf Sync with HEAD.
 1.123.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.123.8.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.125.2.1 18-May-2008  yamt sync with head.
 1.126.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.126.2.6 11-Aug-2010  yamt sync with head.
 1.126.2.5 11-Mar-2010  yamt sync with head
 1.126.2.4 16-Sep-2009  yamt sync with head
 1.126.2.3 19-Aug-2009  yamt sync with head.
 1.126.2.2 04-May-2009  yamt sync with head.
 1.126.2.1 16-May-2008  yamt sync with head.
 1.128.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.130.14.1 17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #1892):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.130.10.1 17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #1892):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.130.4.1 17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #1892):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.130.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.135.4.1 05-Mar-2011  rmind sync with head
 1.135.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.136.8.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.136.8.2 30-Oct-2012  yamt sync with head
 1.136.8.1 17-Apr-2012  yamt sync with head
 1.138.2.2 05-Apr-2012  mrg sync to latest -current.
 1.138.2.1 18-Feb-2012  mrg merge to -current.
 1.141.8.3 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.141.8.2 17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #998):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.141.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.141.6.3 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.141.6.2 17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #998):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.141.6.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.141.2.3 03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.141.2.2 17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #998):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.141.2.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.143.2.4 03-Dec-2017  jdolecek update from HEAD
 1.143.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.143.2.2 23-Jun-2013  tls resync from head
 1.143.2.1 25-Feb-2013  tls resync with head
 1.145.2.2 18-May-2014  rmind sync with head
 1.145.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.148.2.1 10-Aug-2014  tls Rebase.
 1.152.2.4 12-May-2017  snj Pull up following revision(s) (requested by skrll/ozaki-r in ticket #1402):
sys/net/route.c: revision 1.170 via patch
sys/netinet/ip_flow.c: revision 1.73 via patch
sys/netinet6/ip6_flow.c: revision 1.28 via patch
sys/netinet6/nd6.c: revision 1.203 via patch
Run timers in workqueue
Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).
Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.
Proposed on tech-net and tech-kern.
 1.152.2.3 06-Apr-2015  snj Pull up following revision(s) (requested by martin in ticket #655):
sys/netinet6/in6.c: revision 1.182 via patch
sys/netinet6/in6_ifattach.c: revision 1.95 via patch
sys/netinet6/nd6.c: revision 1.158 via patch
sys/netinet6/nd6.h: revision 1.62 via patch
sys/netinet6/nd6_nbr.c: revision 1.104 via patch
sys/netinet6/nd6_rtr.c: revision 1.96 via patch
Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.152.2.2 17-Dec-2014  martin Pull up following revision(s) (requested by roy in ticket #332):
sys/netinet6/nd6_nbr.c: revision 1.103
sys/netinet6/nd6_rtr.c: revision 1.95
sys/netinet6/nd6.h: revision 1.61
sys/netinet6/nd6.c: revision 1.156
Report route additions/changes/deletions for cached neighbours to userland.
 1.152.2.1 27-Oct-2014  martin Pull up following revision(s) (requested by roy in ticket #159):
sys/netinet6/nd6.c: revision 1.153
Tests for neighbour now work correctly on bridge(4) and carp(4) interfaces.
 1.154.2.12 28-Aug-2017  skrll Sync with HEAD
 1.154.2.11 05-Feb-2017  skrll Sync with HEAD
 1.154.2.10 05-Dec-2016  skrll Sync with HEAD
 1.154.2.9 05-Oct-2016  skrll Sync with HEAD
 1.154.2.8 09-Jul-2016  skrll Sync with HEAD
 1.154.2.7 29-May-2016  skrll Sync with HEAD
 1.154.2.6 22-Apr-2016  skrll Sync with HEAD
 1.154.2.5 19-Mar-2016  skrll Sync with HEAD
 1.154.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.154.2.3 22-Sep-2015  skrll Sync with HEAD
 1.154.2.2 06-Jun-2015  skrll Sync with HEAD
 1.154.2.1 06-Apr-2015  skrll Sync with HEAD
 1.203.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.203.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.203.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.203.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.203.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.224.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.232.2.14 20-Aug-2021  martin Pull up following revision(s) (requested by ozaki-r in ticket #1692):

sys/netinet6/nd6.c: revision 1.277

nd6: prevent ln from being freed while releasing held packets
 1.232.2.13 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1396):

sys/netinet6/nd6.h: revision 1.88
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.288 (patch)

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.232.2.12 19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1340):

sys/netinet6/nd6.c: revision 1.257

Add missing IFNET_LOCK for regen_tmpaddr
Reported by ryo@
 1.232.2.11 26-Jul-2019  martin Pull up following revision(s) (requested by christos in ticket #1307):

sys/netinet6/nd6.c: revision 1.256

Decrease the reference count before freeing, so that the entries actually
get free'd. (Ryota Ozaki)
 1.232.2.10 08-Jul-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1285):

sys/netinet6/nd6.c: revision 1.255
tests/net/ndp/t_ndp.sh: revision 1.32

nd6: restore a missing reachability confirmation

On sending a packet over a STALE cache, the cache should be tried a reachability
confirmation, which is described in RFC 2461/4861 7.3.3. On the fast path in
nd6_resolve, however, the treatment for STALE caches has been skipped
accidentally. So STALE caches never be back to the REACHABLE state.

To fix the issue, branch to the fast path only when the cache entry is the
REACHABLE state and leave other caches to the slow path that includes the
treatment. To this end we need to allow to return a link-layer address if a
valid address is available on the slow path too, which is the same behavior as
FreeBSD and OpenBSD.

tests: test state transitions of neighbor caches
 1.232.2.9 06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1080):

sys/netinet6/nd6.c: revision 1.251
sys/netinet/if_arp.c: revision 1.276
sys/net/if.c: revision 1.438
sys/net/if.c: revision 1.439
sys/net/route.c: revision 1.214
sys/net/route.c: revision 1.215
sys/net/route.c: revision 1.216
sys/netinet6/in6.c: revision 1.270
sys/net/route.h: revision 1.120
sys/net/if.c: revision 1.440

Remove a wrong assertion in ifaref

-

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.

-

Use atomic operations for ifa_refcnt

-

Avoid a dangling pointer during rt_replace_ifa

-

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.

-

Use rt_update framework on updating a rtentry
 1.232.2.8 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #842):

sys/netinet6/mld6.c: revision 1.93-1.99
sys/netinet6/in6_var.h: revision 1.99,1.100
sys/netinet6/in6.c: revision 1.267,1.268
sys/netinet6/nd6.c: revision 1.249

Don't hold softnet_lock in mld_timeo
Then we can get rid of remaining abuses of mutex_owned(softnet_lock).

Release in6_multilock on callout_halt of mld_timeo to avoid a deadlock
Improve atomicity of in6_leavegroup and in6_delmulti

Avoid NULL pointer dereference on imm->i6mm_maddr

Make a refcount decrement and a removal from a list of an item atomic
in6m_refcount of an in6m can be incremented if the in6m is on the list
(if_multiaddrs) in in6_addmulti or mld_input. So we must avoid such an
increment when we try to destroy an in6m. To this end we must make
an in6m_refcount decrement and a removal of an in6m from if_multiaddrs
atomic.

Make a deletion of in6m in nd6_rtrequest atomic

Move LIST_REMOVE
mld_stoptimer releases in6_multilock temporarily, so we must LIST_REMOVE first.

Avoid double LIST_REMOVE which corrupts lists
Mark in6m as used for non-DIAGNOSTIC builds.
 1.232.2.7 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.232.2.6 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.232.2.5 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.232.2.4 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #354):
sys/netinet6/in6_ifattach.c: revision 1.113
sys/netinet6/nd6.c: revision 1.238
Use psref instead of pserialize because that code is sleepable
--
Use psref instead of pserialize because that code is sleepable
 1.232.2.3 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #353):
sys/net/if_llatbl.c: 1.22
sys/net/if_llatbl.h: 1.13
sys/netinet/if_arp.c: 1.254
sys/netinet/in.c: 1.208-1.209
sys/netinet6/in6.c: 1.249-1.250
sys/netinet6/nd6.c: 1.237
Remove redundant KASSERTMSG
The function is static, has just one caller and the caller does the same check.
--
Fix a deadlock between a route update and lltable
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.232.2.2 24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #307):
sys/netinet6/nd6.c: revision 1.236
Add missing NULL check
PR kern/52554
 1.232.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.245.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.245.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.245.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.245.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.245.2.2 02-May-2018  pgoyette Synch with HEAD
 1.245.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.249.2.3 21-Apr-2020  martin Sync with HEAD
 1.249.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.249.2.1 10-Jun-2019  christos Sync with HEAD
 1.256.2.9 08-Aug-2022  martin Apply patch, requested by kim in ticket #1497:

sys/netinet6/nd6.c (apply patch)

PR 55680: avoid duplicate free of link layer entries (code in HEAD is
different)
 1.256.2.8 20-Aug-2021  martin Pull up following revision(s) (requested by ozaki-r in ticket #1338):

sys/netinet6/nd6.c: revision 1.277

nd6: prevent ln from being freed while releasing held packets
 1.256.2.7 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #269):

sys/netinet6/nd6.h: revision 1.88
sys/net/rtsock_shared.c: revision 1.10
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.283
sys/netinet/if_arp.c: revision 1.288

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.

-

Initialize dom_mowner for MBUFTRACE
 1.256.2.6 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #169):

sys/netinet6/nd6.h: revision 1.87
sys/netinet6/nd6.c: revision 1.263

inet6: Re-introduce ND6_LLINFO_WAITDELETE so we can return EHOSTDOWN

Once we've sent nd6_mmaxtries NS messages, send RTM_MISS and move to the
ND6_LLINFO_WAITDELETE state rather than freeing the llentry right away.
Wait for a probe cycle and then free the llentry.

If a connection attempts to re-use the llentry during ND6_LLINFO_WAITDELETE,
return EHOSTDOWN (or EHOSTUNREACH if a gateway) to match inet behaviour.

Continue to ND6_LLINFO_INCOMPLETE and send another NS probe in hope of a
reply. Rinse and repeat.

This reverts part of nd6.c r1.14 - an 18 year old commit!
 1.256.2.5 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #168):

sys/net/rtsock.c: revision 1.252
sys/netinet6/nd6_nbr.c: revision 1.168 - 1.172
sys/netinet6/nd6.c: revision 1.262

inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.

This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted
(RTM_DELETED) or has failed to been resolved (RTM_MISS).

The latter case can be interpreted as unreachable.

inet6: change rt_announce and llchange to bool in nd6_na_input()
more bool
 1.256.2.4 01-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #148):

sys/netinet6/nd6.c: revision 1.261

inet6: don't set an invalid lladdr in nd6_free()

We don't want to announce that we've deleted a hwaddr of all zeros.
 1.256.2.3 01-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #131):

sys/netinet6/nd6.c: revision 1.260

inet6: nd6_free assumes all routers are processed by kernel RA

This hasn't been the case for a long time if you're a dhcpcd
user with a default config. As such, it's possible for the default
IPv6 router as set by dhcpcd could be erroneously gc'ed by nd6_free.

This reduces the scope of the ND6_WLOCK taken as well as fixing an
issue where we write to ln->ln_state without a lock being held.
 1.256.2.2 26-Aug-2019  martin Pull up following revision(s) (requested by roy in ticket #109):

sys/net/route.h: revision 1.124
sys/netinet6/nd6.c: revision 1.258
sys/netinet6/nd6.c: revision 1.259
sys/net/rtsock.c: revision 1.251
sys/netinet/if_arp.c: revision 1.284
sys/netinet6/nd6_nbr.c: revision 1.167

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9

-

nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.256.2.1 19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #97):

sys/netinet6/nd6.c: revision 1.257

Add missing IFNET_LOCK for regen_tmpaddr
Reported by ryo@
 1.265.2.1 25-Jan-2020  ad Sync with head.
 1.268.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.274.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.279.4.4 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.279.4.3 12-Apr-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1089):

sys/netinet6/nd6.c: revision 1.283

nd6: send packets through the fast path even if DELAY and PROBE

If there is a valid ND cache, we can send packets for the destination
of the cache. If the state of the cache is STALE, we need to go
through the slow path to change its state. In the other cases
including the DELAY and PROBE states, we can send packets through
the fast path.
 1.279.4.2 18-Apr-2024  martin Pull up following revision(s) (requested by knakahara in ticket #659):

sys/netinet6/in6_ifattach.c: revision 1.122
sys/netinet/sctp_asconf.c: revision 1.14
sys/netinet6/nd6.c: revision 1.282

Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.
The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.
 1.279.4.1 10-Dec-2023  martin Pull up following revision(s) (requested by pgoyette in ticket #487):

sys/compat/common/compat_90_mod.c: revision 1.5
sys/compat/common/compat_90_mod.c: revision 1.6
sys/netinet6/in6.c: revision 1.290
sys/netinet6/in6.c: revision 1.291
sys/compat/common/files.common: revision 1.11
sys/netinet6/icmp6.c: revision 1.255
sys/compat/common/net_inet6_nd_90.c: revision 1.1
sys/compat/common/net_inet6_nd_90.c: revision 1.2
sys/modules/compat_90/Makefile: revision 1.2
sys/modules/compat_90/Makefile: revision 1.3
sys/netinet6/nd6.c: revision 1.281
sys/compat/common/compat_mod.h: revision 1.10
sys/kern/compat_stub.c: revision 1.23
sys/sys/compat_stub.h: revision 1.27

Identify the need to rework the COMPAT_* code to be more
module-aware.
This is an XXX comment block only, NFCI.

Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

Allow kernels builds which don't define INET6 to compile compat bits
too.

Default the build of compat_90 module to include IPv6, as is done
for other INET6-sensitive modules (see if_lagg).
 1.282.2.1 02-Aug-2025  perseant Sync with HEAD
 1.91 11-Sep-2020  roy inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.
 1.90 20-Aug-2020  roy Sprinkle some const
 1.89 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.88 25-Sep-2019  ozaki-r Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.87 01-Sep-2019  roy inet6: Re-introduce ND6_LLINFO_WAITDELETE so we can return EHOSTDOWN

Once we've sent nd6_mmaxtries NS messages, send RTM_MISS and move to the
ND6_LLINFO_WAITDELETE state rather than freeing the llentry right away.
Wait for a probe cycle and then free the llentry.

If a connection attempts to re-use the llentry during ND6_LLINFO_WAITDELETE,
return EHOSTDOWN (or EHOSTUNREACH if a gateway) to match inet behaviour.
Continue to ND6_LLINFO_INCOMPLETE and send another NS probe in hope of a
reply. Rinse and repeat.

This reverts part of nd6.c r1.14 - an 18 year old commit!
 1.86 06-Mar-2018  roy branches: 1.86.2; 1.86.6;
nd6: add a nonce to DaD probes in-case they are looped back to us

This implements RFC 7527, based a similar change in FreeBSD.
 1.85 22-Jun-2017  ozaki-r branches: 1.85.4;
Remove unused function (nd6_rem_ifa_lle)
 1.84 21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.83 22-Feb-2017  ozaki-r branches: 1.83.6;
Fix prefix invalidation via nd6_timer

We cannot remove a prefix there. Instead just invalidate it; the prefix
will be removed when purging an associated address. This is the same as
the original behavior.
 1.82 14-Feb-2017  ozaki-r Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.81 19-Dec-2016  ozaki-r branches: 1.81.2;
Protect IPv6 default router and prefix lists with coarse-grained rwlock

in6_purgeaddr (in6_unlink_ifa) itself unrefernces a prefix entry and calls
nd6_prelist_remove if the counter becomes 0, so callers doesn't need to
handle the reference counting.

Performance-sensitive paths (sending/forwarding packets) call just one
reader lock. This is a trade-off between performance impact vs. the amount
of efforts; if we want to remove the reader lock, we need huge amount of
works including destroying objects with psz/psref in softint, for example.
 1.80 19-Dec-2016  ozaki-r Get rid of extra nd6_purge from in6_ifdetach

There were two nd6_purge in in6_ifdetach for some reason, but at least now
We don't need extra nd6_purge. Remove it and instead add assertions that
check if surely purged.
 1.79 14-Dec-2016  ozaki-r Make functions static
 1.78 12-Dec-2016  ozaki-r Introduce macros for the prefix list

No functional change.
 1.77 12-Dec-2016  ozaki-r Introduce macros for the default router list

No functional change.
 1.76 11-Dec-2016  ozaki-r Add nd6_ prefix to exported functions
 1.75 11-Dec-2016  ozaki-r Move default interface things from nd6_rtr.c to nd6.c
 1.74 11-Dec-2016  ozaki-r Make some functions static
 1.73 11-Dec-2016  ozaki-r Remove function declarations that have no actual definition
 1.72 04-Apr-2016  ozaki-r branches: 1.72.2;
Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.71 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.70 01-Apr-2016  ozaki-r Tidy up nd6_timer initialization
 1.69 07-Dec-2015  ozaki-r CID 1341546: Fix integer handling issue (CONSTANT_EXPRESSION_RESULT)

n > INT_MAX where n is a long integer variable never be true on 32bit
architectures. Use time_t(int64_t) instead of long for the variable.
 1.68 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.67 18-Nov-2015  ozaki-r Stop passing llinfo_nd6 to nd6_ns_output

This is a restructuring for coming changes to nd6 (replacing
llinfo_nd6 with llentry). Once we have a lock of llinfo_nd6,
we need to pass it to nd6_ns_output with holding the lock.
However, in a function subsequent to nd6_ns_output, the llinfo_nd6
may be looked up, i.e., its lock would be acquired again.
To avoid such a situation, pass only required data (in6_addr) to
nd6_ns_output instead of passing whole llinfo_nd6.

Inspired by FreeBSD
 1.66 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.65 15-Jul-2015  ozaki-r Remove unused arguments and the associated code from nd6_nud_hint()

from OpenBSD
 1.64 25-Feb-2015  roy Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.
 1.63 25-Feb-2015  roy Retire nd6_newaddrmsg and use rt_newaddrmsg directly instead so that
we don't spam route changes when the route hasn't changed.
 1.62 23-Feb-2015  martin Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.61 16-Dec-2014  roy Report route additions/changes/deletions for cached neighbours to userland.
 1.60 05-Sep-2014  matt branches: 1.60.2;
Don't use C++ keyword as variable.
Use different prefix for nd6_prefixctl members than for nd6_prefix members.
 1.59 05-Jun-2014  roy branches: 1.59.2;
Add IPV6CTL_AUTO_LINKLOCAL and ND6_IFF_AUTO_LINKLOCAL toggles which
control the automatic creation of IPv6 link-local addresses when an
interface is brought up.

Taken from FreeBSD.
 1.58 21-May-2013  roy branches: 1.58.6;
For IPv6, emit RTM_NEWADDR once DAD completes and also when address flag
changes. Tentative addresses are not emitted.

Version bumped so userland can detect this behaviour change.
 1.57 23-Jun-2012  christos branches: 1.57.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.56 19-Nov-2011  tls branches: 1.56.4; 1.56.8; 1.56.10;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.55 11-Nov-2011  gdt Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.
 1.54 24-May-2011  spz branches: 1.54.4;
RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.53 06-Nov-2009  dyoung branches: 1.53.4; 1.53.6;
Fix net.inet6.ip6.accept_rtadv and 'ndp -i <interface> accept_rtadv':

Add a flag ND6_IFF_OVERRIDE_RTADV that tells the kernel to override
ip6_accept_rtadv (net.inet6.ip6.accept_rtadv) on an interface.

Add a routine nd6_accepts_rtadv(ndi) that evaluates both the flags
on the interface represented by ndi and ip6_accept_rtadv, and
returns 'true' if the given interface should accept Router
Advertisements, and 'false' if not.

Now, ND6_IFF_ACCEPT_RTADV works as it was historically documented:
if it is set, then accept router advertisements iff ip6_accept_rtadv
!= 0. Otherwise, do not accept router advertisements.

If ND6_IFF_OVERRIDE_RTADV is set, then the flag ND6_IFF_ACCEPT_RTADV
overrides ip6_accept_rtadv: if ND6_IFF_ACCEPT_RTADV is set, accept;
otherwise reject. Ignore ip6_accept_rtadv.

If neither ND6_IFF_ACCEPT_RTADV nor ND6_IFF_OVERRIDE_RTADV is set,
reject Router Advertisements.
 1.52 15-Jan-2009  christos - switch the lifetime struct to time_t and provide compatibility for the
old ioctl.
 1.51 24-Oct-2008  dyoung branches: 1.51.2;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.50 30-Aug-2007  dyoung branches: 1.50.16; 1.50.20; 1.50.24; 1.50.30;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.49 07-Aug-2007  dyoung branches: 1.49.2; 1.49.4;
Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.48 19-Jul-2007  dyoung branches: 1.48.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.47 17-May-2007  dyoung branches: 1.47.2;
Fix the memory leak reported in kern/36337. Thanks Matthias Scheler
for the heads-up. My fix is based on the following patches from
FreeBSD, however, I extracted the code into a subroutine,
nd6_llinfo_release_pkts():

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6.c.diff?r1=1.48.2.18;r2=1.48.2.19
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6_nbr.c.diff?r1=1.29.2.8;r2=1.29.2.9
 1.46 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.45 15-Mar-2007  dyoung In nd6_lookup, shorten a staircase. KNF: change return (expr); to
return expr; throughout. Fix K&R prototypes and parameter type
declarations.
 1.44 04-Mar-2007  christos branches: 1.44.2; 1.44.4; 1.44.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.43 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.42 20-Nov-2006  dyoung branches: 1.42.4;
Use LIST_/TAILQ_ macros, esp. LIST_FOREACH() and TAILQ_FOREACH().
Use the usual idiom for iterating over a list where we might
_REMOVE() entries,

for (x = TAILQ_FIRST(...); x != NULL; x = nx) {
nx = TAILQ_NEXT(x, ...);
...
}
 1.41 05-Mar-2006  rpaulo branches: 1.41.12; 1.41.14;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.40 10-Dec-2005  elad branches: 1.40.4; 1.40.6; 1.40.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.39 28-Feb-2005  itojun branches: 1.39.4;
make ip6_getpmtu back to static
 1.38 23-Mar-2004  martti branches: 1.38.8; 1.38.10;
Make ip6_getpmtu() globally visible. This is needed by IPFilter 4.x.
 1.37 04-Feb-2004  tron Remove outdated prototype for ip6_getpmtu(). The function has a different
signature now and is statically declared in "ip6_output.c".
 1.36 24-Jan-2004  darrenr make ip6_getpmtu() externally visible
 1.35 27-Jun-2003  itojun branches: 1.35.2;
split ND6 cache timer management to per-entry. increased accuracy,
no O(N) loop. sync w/ kame
 1.34 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.33 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.32 08-Jun-2002  itojun indent cleanup
 1.31 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.30 07-Jun-2002  itojun cope with cases when maxmtu == 0 (this shoulnd't happen!)
 1.29 05-Jun-2002  itojun be sure to use L3 MTU, not L2 MTU, when specified in spec (affects FDDI/ARCnet)
 1.28 30-May-2002  itojun improve nd6_setmtu(), to warn too-small MTU on SIOCSIFMTU. sync w/kame
 1.27 29-May-2002  itojun "receivedra" field name is obsolete.
 1.26 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.25 28-May-2002  itojun use arc4random
 1.24 18-Dec-2001  itojun branches: 1.24.8;
reduce white space/cosmetic diffs w/kame.
 1.23 18-Oct-2001  itojun reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.22 17-Oct-2001  itojun do not change neighbor cache state on entry timeout,
if the cache entry is for outgoing router.

perform on-linkness check before default router (re-)seletion.

do not play with interface direct route on nd6_rtrequest.

sync a lot of cosmetic changes. sync with kame
 1.21 11-Jun-2001  wiz branches: 1.21.2;
Fix various misspellings of compatible/compatibility.
 1.20 23-Feb-2001  itojun branches: 1.20.2;
garbage-collect stale ND entries (default: 1 day).
RFC 2461 5.3. sync with kame.
 1.19 23-Feb-2001  itojun remove unnecessary state, ND6_LLINFO_WAITDELETE, from neighbor cache
state machine.
no need for RTF_REJECT on neighbor cache entires, they are leftover from
ARP code.
sync with kame.
 1.18 08-Feb-2001  itojun when chasing nd6_llinfo chain, make sure we do not touch dangling
pointer (due to RTM_DELETE during default router list management).
from kame
 1.17 07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.16 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.15 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.14 19-May-2000  itojun branches: 1.14.4;
do not mistakingly forward link-local scoped packet (the bug was added
with "beyondscope" icmp6 support).
"options FAKE_LOOPBACK_IF" will honor scope on loopback outputs. rcvif will
be real interface, not the loopback, just like when multicast loopback.

(sync with kame)
 1.13 09-May-2000  itojun do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).
 1.12 16-Apr-2000  itojun perform neighbor unreachability detection on p2p links (spec requires
it for bidir p2p links).
improve -i in ndp(8) to allow tweaking per-interface ND flag on.
fix ndp(8) infinite loop on certain routing table setup.
 1.11 16-Apr-2000  itojun better sync with latest kame (cosmetic only).
 1.10 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.9 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.8 04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.7 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 31-Jul-1999  itojun branches: 1.5.2; 1.5.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.4 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file nd6.h was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file nd6.h was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.5.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.4.2 09-May-2001  he Pull up revision 1.17 (via patch, requested by itojun):
Suppress ND6 logs that are too noisy for normal use. Can be
re-enabled by net.inet6.icmp6.nd6_debug.
 1.14.4.1 20-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
- add protection mechanism against ND cache corruption due to bad NUD hints.

this is part of:
sys/netinet/icmp6.h 1.9 -> 1.10
sys/netinet/tcp_input.c 1.111 -> 1.112
sys/netinet6/icmp6.c 1.34 -> 1.35
sys/netinet6/nd6.c 1.30 -> 1.31
sys/netinet6/nd6.h 1.14 -> 1.15
 1.20.2.5 11-Nov-2002  nathanw Catch up to -current
 1.20.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.20.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.20.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.20.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.21.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.21.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.24.8.2 20-Jun-2002  gehenna catch up with -current.
 1.24.8.1 30-May-2002  gehenna Catch up with -current.
 1.35.2.5 11-Dec-2005  christos Sync with head.
 1.35.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.35.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.35.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.35.2.1 03-Aug-2004  skrll Sync with HEAD
 1.38.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.38.8.1 29-Apr-2005  kent sync with -current
 1.39.4.4 03-Sep-2007  yamt sync with head.
 1.39.4.3 26-Feb-2007  yamt sync with head.
 1.39.4.2 30-Dec-2006  yamt sync with head.
 1.39.4.1 21-Jun-2006  yamt sync with head.
 1.40.8.1 13-Mar-2006  yamt sync with head.
 1.40.6.1 22-Apr-2006  simonb Sync with head.
 1.40.4.1 09-Sep-2006  rpaulo sync with head
 1.41.14.1 10-Dec-2006  yamt sync with head.
 1.41.12.1 12-Jan-2007  ad Sync with head.
 1.42.4.5 17-May-2007  yamt sync with head.
 1.42.4.4 07-May-2007  yamt sync with head.
 1.42.4.3 24-Mar-2007  yamt sync with head.
 1.42.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.42.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.44.6.1 18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.44.4.1 11-Jul-2007  mjf Sync with head.
 1.44.2.4 09-Oct-2007  ad Sync with head.
 1.44.2.3 20-Aug-2007  ad Sync with HEAD.
 1.44.2.2 08-Jun-2007  ad Sync with head.
 1.44.2.1 10-Apr-2007  ad Sync with head.
 1.47.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.47.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.48.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.48.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.49.4.2 07-Aug-2007  dyoung Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.49.4.1 07-Aug-2007  dyoung file nd6.h was added on branch matt-mips64 on 2007-08-07 04:35:44 +0000
 1.49.2.1 06-Nov-2007  matt sync with HEAD
 1.50.30.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.50.24.2 11-Mar-2010  yamt sync with head
 1.50.24.1 04-May-2009  yamt sync with head.
 1.50.20.1 17-Jan-2009  mjf Sync with HEAD.
 1.50.16.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.51.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.53.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.53.4.1 31-May-2011  rmind sync with head
 1.54.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.54.4.2 30-Oct-2012  yamt sync with head
 1.54.4.1 17-Apr-2012  yamt sync with head
 1.56.10.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.56.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.56.4.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.57.2.3 03-Dec-2017  jdolecek update from HEAD
 1.57.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.57.2.1 23-Jun-2013  tls resync from head
 1.58.6.1 10-Aug-2014  tls Rebase.
 1.59.2.2 06-Apr-2015  snj Pull up following revision(s) (requested by martin in ticket #655):
sys/netinet6/in6.c: revision 1.182 via patch
sys/netinet6/in6_ifattach.c: revision 1.95 via patch
sys/netinet6/nd6.c: revision 1.158 via patch
sys/netinet6/nd6.h: revision 1.62 via patch
sys/netinet6/nd6_nbr.c: revision 1.104 via patch
sys/netinet6/nd6_rtr.c: revision 1.96 via patch
Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.59.2.1 17-Dec-2014  martin Pull up following revision(s) (requested by roy in ticket #332):
sys/netinet6/nd6_nbr.c: revision 1.103
sys/netinet6/nd6_rtr.c: revision 1.95
sys/netinet6/nd6.h: revision 1.61
sys/netinet6/nd6.c: revision 1.156
Report route additions/changes/deletions for cached neighbours to userland.
 1.60.2.6 28-Aug-2017  skrll Sync with HEAD
 1.60.2.5 05-Feb-2017  skrll Sync with HEAD
 1.60.2.4 22-Apr-2016  skrll Sync with HEAD
 1.60.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.60.2.2 22-Sep-2015  skrll Sync with HEAD
 1.60.2.1 06-Apr-2015  skrll Sync with HEAD
 1.72.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.72.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.81.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.83.6.2 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1396):

sys/netinet6/nd6.h: revision 1.88
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.288 (patch)

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.83.6.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.85.4.1 15-Mar-2018  pgoyette Synch with HEAD
 1.86.6.2 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #269):

sys/netinet6/nd6.h: revision 1.88
sys/net/rtsock_shared.c: revision 1.10
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.283
sys/netinet/if_arp.c: revision 1.288

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.

-

Initialize dom_mowner for MBUFTRACE
 1.86.6.1 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #169):

sys/netinet6/nd6.h: revision 1.87
sys/netinet6/nd6.c: revision 1.263

inet6: Re-introduce ND6_LLINFO_WAITDELETE so we can return EHOSTDOWN

Once we've sent nd6_mmaxtries NS messages, send RTM_MISS and move to the
ND6_LLINFO_WAITDELETE state rather than freeing the llentry right away.
Wait for a probe cycle and then free the llentry.

If a connection attempts to re-use the llentry during ND6_LLINFO_WAITDELETE,
return EHOSTDOWN (or EHOSTUNREACH if a gateway) to match inet behaviour.

Continue to ND6_LLINFO_INCOMPLETE and send another NS probe in hope of a
reply. Rinse and repeat.

This reverts part of nd6.c r1.14 - an 18 year old commit!
 1.86.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.186 28-Apr-2025  joe clear whitespace in IPV6 neighbor solicitation
 1.185 13-Nov-2024  roy ARP/ND6: Revert prior

Turns out some people actually use this behaviour and strictly speaking
it is allowed by RFC5227 2.4 where it says:

At any time, if a host receives
an ARP packet (Request *or* Reply) where the 'sender IP address' is
(one of) the host's own IP address(es) configured on that interface,
but the 'sender hardware address' does not match any of the host's
own interface addresses, then this is a conflicting ARP packet

The key part is "any of the host's own interface addreses".
 1.184 05-Oct-2024  roy ND6: only ignore messages from the receving interface

Sync with ARP behaviour, reverts r1.163 slightly.
 1.183 29-Mar-2023  kardel branches: 1.183.6;
use carp mac address when replying to neighbor solicitations referring
to carp interface addresses.
unconfuses commercial routers
 1.182 02-Aug-2021  andvar fix various typos in comments and log messages.
 1.181 11-Sep-2020  roy inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.
 1.180 20-Aug-2020  roy Sprinkle some const
 1.179 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.178 22-Apr-2020  roy inet6: nd6_na_input() now considers ln_state <= ND6_LLINFO_INCOMPLETE

Otherwise if ln_state != ND6_LLINFO_INCOMPLETE and the is no lladdr
and this message was solicited then ln_state is set to ND6_LLINFO_REACHABLE
which could then cause a panic in nd6_resolve().
If ln_state > ND6_LLINFO_INCOMPLETE then it's assumed we have a lladdr.

Potentially this could have been triggered by the introduction of
ND6_LLINFO_PURGE in nd6.c r1.143 but also by the re-introduction of
ND6_LLINFO_INCOMPLETE in nd6.c r1.263.
Depending on the timing, it's technically possible to receive such
a message after the llentry is created with ND6_LLINFO_NOSTATE.
 1.177 09-Mar-2020  roy branches: 1.177.2;
route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.176 20-Jan-2020  thorpej Remove FDDI support.
 1.175 13-Nov-2019  ozaki-r branches: 1.175.2;
Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.174 25-Sep-2019  ozaki-r Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.173 18-Sep-2019  ozaki-r nd6: remove extra pserialize_read_exit
 1.172 01-Sep-2019  roy inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.
This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted (RTM_DELETED)
or has failed to been resolved (RTM_MISS). The latter case can be
interpreted as unreachable.
 1.171 30-Aug-2019  roy inet6: Revert prior

It's not needed, listing to RA is enough as discussed on tech-net.
 1.170 29-Aug-2019  roy Userland really has no business with NA messages.
However, RFC 4861 6.2.5 only says departing routers
*SHOULD* send RA with lifetime of zero and *MUST*
send all subsequent NA messages if the router flag
unset.

To help userland avoid the expensive process of
parsing NA messages, send RTM_CHANGE without a
lladdr in the gateway.
This is different from the intial RTM_ADD also
without a lladdr in the gateway and RTM_DELETE.
 1.169 29-Aug-2019  roy more bool
 1.168 29-Aug-2019  roy inet6: change rt_announce and llchange to bool in nd6_na_input()
 1.167 22-Aug-2019  roy nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.166 29-Apr-2019  roy branches: 1.166.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.
 1.165 29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.164 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.163 13-Dec-2018  roy inet6: discard any received NA with a LL address we own

This matches ARP behaviour.
 1.162 07-Dec-2018  roy inet6: match NS nonce to any interface

This allows the same address to exist on many interfaces on the same
prefix, matching the inet behaviour.
 1.161 04-Dec-2018  roy inet6: remove needless ifa_release.
 1.160 04-Dec-2018  roy inet6: use one function for nd6_dad_input

Having different ones for NA and NS is a bit wasteful.
 1.159 04-Dec-2018  roy inet6: simplify NA DaD checking
 1.158 04-Dec-2018  roy inet6: remove unused dad ns/na counters

The current DaD code triggers when either an NS or NA is directly
received, so the counters themselves do nothing of use.
 1.157 29-Nov-2018  ozaki-r Introduce and use ip_dad_enabled() and ip6_dad_enabled() functions
 1.156 19-May-2018  maxv branches: 1.156.2;
Style.
 1.155 17-May-2018  maxv Fix the KASSERTs. It doesn't matter at all since the packet can't be this
big anyway, and there are many other places that have this kind of typo;
but still fix it, for the sake of closing PR/49834.
 1.154 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.153 19-Mar-2018  ozaki-r Pull out a sleepable function (in6_selectsrc) from a pserialize read section
 1.152 08-Mar-2018  ozaki-r Fix a race condition on DAD destructions (again)

The previous fix to DAD timers was wrong; it avoided a use-after-free but
instead introduced a memory leak. The destruction method had delegated
a destruction of a DAD timer to the timer itself and told that by setting NULL
to dp->dad_ifa. However, the previous fix made DAD timers do nothing on
the sign.

Fixing the issue with using callout_stop isn't easy. One approach is to have
a refcount on dp but it introduces extra complexity that we want to avoid.

The new fix falls back to using callout_halt, which was abandoned because of
softnet_lock. Fortunately now the network stack is protected by KERNEL_LOCK
so we can remove softnet_lock from DAD timers (callout) and use callout_halt
safely.
 1.151 07-Mar-2018  ozaki-r Avoid passing NULL to nd6_dad_duplicated

Fix PR kern/53075
 1.150 06-Mar-2018  martin Remove unused variables
 1.149 06-Mar-2018  roy nd6: add a nonce to DaD probes in-case they are looped back to us

This implements RFC 7527, based a similar change in FreeBSD.
 1.148 24-Feb-2018  ozaki-r branches: 1.148.2;
Avoid a race condition of DAD timer destructions

When we see dp->dad_ifa == NULL, it means that the ifa is being deleted and also
the callout is scheduled again by someone. We shouldn't rely on a result of
callout_pending to know if the callout is scheduled because it returns false if
the subsequent callout handler is already on the fly.

We have to always delegate the destruction of dp to the subsequent handler
unconditionally if dp->dad_ifa == NULL. Otherwise, the first handler destroys
the dp and the second handler tries to handle destroyed dp.
 1.147 24-Feb-2018  ozaki-r Simplify; pass dp to nd6_dad_duplicated instead of looking it up again in it
 1.146 24-Feb-2018  ozaki-r Use KASSERT for checking a programming error
 1.145 02-Feb-2018  maxv Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.144 16-Jan-2018  ozaki-r Make DAD destructions (MP-)safe with callout_stop

arp_dad_stoptimer and nd6_dad_stoptimer can be called with or without
softnet_lock held and unfortunately we have no easy way to statically know which.
So it is hard to use callout_halt there.

To address the situation, we use callout_stop to make the code safe. The new
approach copes with the issue by delegating the destruction of a callout to
callout itself, which allows us to not wait the callout to finish. This can be
done thanks to that DAD objects are separated from other data such as ifa.

The approach is suggested by riastradh@
Proposed on tech-kern@ and tech-net@
 1.143 16-Jan-2018  ozaki-r Revert "Work around softnet_lock handling" as per pgoyette@'s request

We should avoid if (mutex_owned(softnet_lock)).
 1.142 10-Jan-2018  ozaki-r Get rid of unnecessary ifdef for IFT_IEEE80211
 1.141 10-Jan-2018  ozaki-r Fix a deadlock on callout_halt of nd6_dad_timer

We must not call callout_halt of nd6_dad_timer with holding nd6_dad_lock because
the lock is taken in nd6_dad_timer. Once softnet_lock goes away, we can pass the
lock to callout_halt, but for now we cannot.
 1.140 26-Dec-2017  ozaki-r Work around softnet_lock handling

nd6_dad_stoptimer can be called with or without softnet_lock held.
callout_halt has to take softnet_lock depending on the situation.
 1.139 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.138 14-Mar-2017  ozaki-r branches: 1.138.6;
Replace DIAGNOSTIC + panic with KASSERT
 1.137 21-Feb-2017  ozaki-r Replace malloc for DAD with kmem and move them out of the lock for DAD
 1.136 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.135 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.134 19-Dec-2016  ozaki-r branches: 1.134.2;
Protect IPv6 default router and prefix lists with coarse-grained rwlock

in6_purgeaddr (in6_unlink_ifa) itself unrefernces a prefix entry and calls
nd6_prelist_remove if the counter becomes 0, so callers doesn't need to
handle the reference counting.

Performance-sensitive paths (sending/forwarding packets) call just one
reader lock. This is a trade-off between performance impact vs. the amount
of efforts; if we want to remove the reader lock, we need huge amount of
works including destroying objects with psz/psref in softint, for example.
 1.133 14-Dec-2016  ozaki-r Make functions static
 1.132 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.131 11-Dec-2016  ozaki-r Add nd6_ prefix to exported functions
 1.130 15-Nov-2016  mlelstv nd6_dad_duplicated takes the lock itself. Move it out of the critical
section.
 1.129 31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.128 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.127 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.126 28-Jul-2016  ozaki-r Fix panic on adding/deleting IP addresses under network load

Adding and deleting IP addresses aren't serialized with other network
opeartions, e.g., forwarding packets. So if we add or delete an IP
address under network load, a kernel panic may happen on manipulating
network-related shared objects such as rtentry and rtcache.

To avoid such panicks, we still need to hold softnet_lock in in_control
and in6_control that are called via ioctl and do network-related operations
including IP address additions/deletions.

Fix PR kern/51356
 1.125 25-Jul-2016  ozaki-r Make DAD of ARP/NDP MP-safe with coarse-grained locks

The change also prevents arp_dad_timer/nd6_dad_timer from running if
arp_dad_stop/nd6_dad_stop is called, which makes sure that callout_reset
won't be called during callout_halt.
 1.124 25-Jul-2016  ozaki-r Use KASSERT for checking non-NULL of ifa->ifa_ifp

ifa->ifa_ifp should be always non-NULL, so doing the check only if
DIAGNOSTIC is ok.
 1.123 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.122 01-Jul-2016  ozaki-r branches: 1.122.2;
Make sure to free all interface addresses in if_detach

Addresses of an interface (struct ifaddr) have a (reverse) pointer of an
interface object (ifa->ifa_ifp). If the addresses are surely freed when
their interface is destroyed, the pointer is always valid and we don't
need a tweak of replacing the pointer to if_index like mbuf.

In order to make sure the assumption, the following changes are required:
- Deactivate the interface at the firstish of if_detach. This prevents
in6_unlink_ifa from saving multicast addresses (wrongly)
- Invalidate rtcache(s) and clear a rtentry referencing an address on
RTM_DELETE. rtcache(s) may delay freeing an address
- Replace callout_stop with callout_halt of DAD timers to ensure stopping
such timers in if_detach
 1.121 21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.120 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.119 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.118 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.117 29-Apr-2016  is Let non-neighbor NS/NA debug error message include useful information.
 1.116 11-Apr-2016  ozaki-r Don't call pfxlist_onlink_check with holding llentry lock

From FreeBSD (as of 2016-04-11).

Should fix PR kern/51060.
 1.115 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.114 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.113 07-Dec-2015  ozaki-r CID 1341546: Fix integer handling issue (CONSTANT_EXPRESSION_RESULT)

n > INT_MAX where n is a long integer variable never be true on 32bit
architectures. Use time_t(int64_t) instead of long for the variable.
 1.112 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.111 18-Nov-2015  ozaki-r Stop passing llinfo_nd6 to nd6_ns_output

This is a restructuring for coming changes to nd6 (replacing
llinfo_nd6 with llentry). Once we have a lock of llinfo_nd6,
we need to pass it to nd6_ns_output with holding the lock.
However, in a function subsequent to nd6_ns_output, the llinfo_nd6
may be looked up, i.e., its lock would be acquired again.
To avoid such a situation, pass only required data (in6_addr) to
nd6_ns_output instead of passing whole llinfo_nd6.

Inspired by FreeBSD
 1.110 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.109 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.108 27-Apr-2015  ozaki-r Add missing error checks on rtcache_setdst

It can fail with ENOMEM.
 1.107 30-Mar-2015  ozaki-r Tidy up opt_ipsec.h inclusions
 1.106 25-Feb-2015  roy Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.
 1.105 25-Feb-2015  roy Retire nd6_newaddrmsg and use rt_newaddrmsg directly instead so that
we don't spam route changes when the route hasn't changed.
 1.104 23-Feb-2015  martin Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.103 16-Dec-2014  roy Report route additions/changes/deletions for cached neighbours to userland.
 1.102 12-Oct-2014  roy branches: 1.102.2;
Remove redundant logging.
 1.101 09-Sep-2014  rmind Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.100 01-Jul-2014  ozaki-r branches: 1.100.2;
Stop using callout randomly

nd6_dad_start uses callout when xtick > 0 while doesn't when
xtick == 0. So if we pass a random value ranging from 0 to N,
nd6_dad_start uses callout randomly. This behavior makes
debugging difficult.

Discussed in http://mail-index.netbsd.org/tech-kern/2014/06/25/msg017278.html
 1.99 13-Jan-2014  roy branches: 1.99.2;
When handling NS/NA we need to check our prefix list instead of our
address list to work out if it came from a valid neighbor.
 1.98 21-May-2013  roy branches: 1.98.2;
Disable nd6_newaddrmsg debug
 1.97 21-May-2013  roy For IPv6, emit RTM_NEWADDR once DAD completes and also when address flag
changes. Tentative addresses are not emitted.

Version bumped so userland can detect this behaviour change.
 1.96 22-Mar-2012  drochner branches: 1.96.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.95 19-Dec-2011  drochner branches: 1.95.2; 1.95.6; 1.95.8;
rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.94 18-Apr-2009  tsutsui branches: 1.94.12; 1.94.16;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.93 18-Mar-2009  cegger bcopy -> memcpy
 1.92 18-Mar-2009  cegger bzero -> memset
 1.91 18-Mar-2009  cegger bcmp -> memcmp
 1.90 31-Jul-2008  matt branches: 1.90.2; 1.90.8;
Generalize previous fix so that both NS and NA packets are checked.
 1.89 31-Jul-2008  matt If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.
 1.88 22-May-2008  dyoung branches: 1.88.4;
Cosmetic: join lines.
 1.87 22-May-2008  dyoung Cosmetic: don't cast NULL unnecessarily.
 1.86 24-Apr-2008  ad branches: 1.86.2; 1.86.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.85 15-Apr-2008  thorpej branches: 1.85.2;
Make ip6 and icmp6 stats per-cpu.
 1.84 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.83 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.82 16-Nov-2007  dyoung branches: 1.82.10; 1.82.14;
We might leave nd6_ns_output() really early. Postpone memset()
until after we decide to stay.
 1.81 10-Nov-2007  dyoung Use sockaddr_in6_init().
 1.80 30-Aug-2007  dyoung branches: 1.80.4; 1.80.6;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.79 26-Aug-2007  dyoung branches: 1.79.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.78 07-Aug-2007  dyoung branches: 1.78.2;
Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.77 19-Jul-2007  dyoung branches: 1.77.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.76 09-Jul-2007  ad branches: 1.76.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.75 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.74 17-May-2007  dyoung Fix the memory leak reported in kern/36337. Thanks Matthias Scheler
for the heads-up. My fix is based on the following patches from
FreeBSD, however, I extracted the code into a subroutine,
nd6_llinfo_release_pkts():

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6.c.diff?r1=1.48.2.18;r2=1.48.2.19
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6_nbr.c.diff?r1=1.29.2.8;r2=1.29.2.9
 1.73 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.72 15-Mar-2007  dyoung Don't open-code TAILQ_FOREACH(). KNF: Fix K&R prototypes and
parameter-type declarations.
 1.71 04-Mar-2007  christos branches: 1.71.2; 1.71.4; 1.71.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.70 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.69 29-Jan-2007  dyoung branches: 1.69.2;
Cosmetic: bzero -> memset. Change a bcopy() to a struct assignment.
 1.68 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.67 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.66 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.65 28-Jun-2006  drochner branches: 1.65.4; 1.65.6; 1.65.8; 1.65.14;
fix the dad_count logic: if we send a packet successfully, reset the counter
for sent tries -- otherwise it gets confused if dad_count is set to >15
by the sysctl, and addresses get stuck in "tentative" state forever
 1.64 18-May-2006  liamjfoy branches: 1.64.4;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.63 06-Mar-2006  rpaulo branches: 1.63.4;
Rename local variables called delay that shadow the delay() decl.
Pointed out by Robert Swindells.
 1.62 05-Mar-2006  rpaulo NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.61 03-Mar-2006  rpaulo branches: 1.61.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.60 25-Feb-2006  wiz Fix typos, reported by Alexey Dobriyan ("Gathered from Linux"),
forwarded by jmc@openbsd.
 1.59 21-Jan-2006  rpaulo branches: 1.59.2; 1.59.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.58 11-Dec-2005  christos branches: 1.58.2;
merge ktrace-lwp.
 1.57 29-May-2005  christos branches: 1.57.2;
- avoid shadowed variables
- sprinkle const.
 1.56 26-Feb-2005  perry branches: 1.56.2; 1.56.4; 1.56.6;
nuke trailing whitespace
 1.55 10-Feb-2005  itojun backout 1.54. heurestic code should never be used. if you experience DAD
failure, suspect your driver, not ND code.
 1.54 02-Feb-2005  drochner Give DAD a chance to succeed even if the network is "slightly broken"
(in my case it as a switch set to "monitor" mode):
If we see an NS request for the address we are just probing for, for
three times the number of DAD packets we are supposed to send (the
"ip6.dad_count" sysctl variable), assume that these are our own packets
and let DAD succeed.
The code for this was mostly there, commented out. Just needed some fixes.
The "three times" is heuristic of course.
Being here, reset the "dad_ns_tcount" variable on a successful send;
otherwise we get strange interdependencies with user-settable variables
(ever tried to set ip6.dad_count to something >15?).
 1.53 10-Feb-2004  itojun branches: 1.53.8; 1.53.10;
reduce useless variables
 1.52 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.51 05-Sep-2003  itojun u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.50 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.49 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.48 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.47 27-Jun-2003  itojun branches: 1.47.2;
split ND6 cache timer management to per-entry. increased accuracy,
no O(N) loop. sync w/ kame
 1.46 24-Jun-2003  itojun remove unneeded checks of accept_rtadv. from kame
 1.45 24-Jun-2003  itojun use time.tv_sec directly
 1.44 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.43 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.42 09-Jun-2002  itojun whitespace cleanup
 1.41 08-Jun-2002  itojun KNF
 1.40 08-Jun-2002  itojun gc
 1.39 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.38 07-Jun-2002  itojun whitespace
 1.37 07-Jun-2002  itojun whitespace
 1.36 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.35 28-May-2002  itojun use arc4random() where possible.
XXX is it necessary to do microtime() on tcp syn cache?
 1.34 15-Mar-2002  itojun branches: 1.34.4; 1.34.6;
s/0/NULL/ as ln_hold is a pointer. sync w/ kame
 1.33 13-Nov-2001  lukem add RCSIDs
 1.32 18-Oct-2001  itojun reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.31 17-Oct-2001  itojun do not change neighbor cache state on entry timeout,
if the cache entry is for outgoing router.

perform on-linkness check before default router (re-)seletion.

do not play with interface direct route on nd6_rtrequest.

sync a lot of cosmetic changes. sync with kame
 1.30 17-Oct-2001  itojun unifdef OLDIP6OUTPUT
 1.29 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.28 23-Feb-2001  itojun branches: 1.28.2; 1.28.4;
garbage-collect stale ND entries (default: 1 day).
RFC 2461 5.3. sync with kame.
 1.27 11-Feb-2001  itojun make sure to clean ln_byhint on reachability confirmation.
 1.26 07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.25 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.24 17-Jan-2001  itojun wrap noisy ND6 debugging messages with ND6_DEBUG. sync with kame
 1.23 05-Nov-2000  onoe First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.22 19-May-2000  itojun branches: 1.22.4;
do not mistakingly forward link-local scoped packet (the bug was added
with "beyondscope" icmp6 support).
"options FAKE_LOOPBACK_IF" will honor scope on loopback outputs. rcvif will
be real interface, not the loopback, just like when multicast loopback.

(sync with kame)
 1.21 24-Mar-2000  itojun move ia6->ia6_dad_ch to dp->dad_timer_ch, to ease KAME code sharing.
now in6_var.h does not need to pull sys/callout.h in.
 1.20 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.19 16-Mar-2000  thorpej Quiet down the DAD messages a little more.
 1.18 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.17 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.16 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.15 07-Feb-2000  itojun add more sanity check against mbuf length.
use log() for DAD related kernel message.
 1.14 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.13 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.12 28-Jan-2000  itojun wrap "DAD start" message into #ifdef DIAGNOSTIC.
From: thorpej, "Soren S. Jorvang" <soren@wheel.dk>
 1.11 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.10 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.9 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.8 19-Sep-1999  is branches: 1.8.2; 1.8.8;
Zeroth version of IPv6 support for ARCnet. Correct MTU handling still needs
to be done.
 1.7 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.6 10-Jul-1999  thorpej Clean up some printfs(), and mark a few for possible later nuking,
since they appear to be for debugging purposes only.
 1.5 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file nd6_nbr.c was initially added on branch kame.
 1.1.2.4 30-Nov-1999  itojun avoid panic due to uninitialized pointer (on ipsec policy check in ip6_output).
(critical fix sync from KAME)
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file nd6_nbr.c was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.8.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.8.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.8.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.22.4.2 09-May-2001  he Pull up revision 1.26 (requested by itojun):
Suppress ND6 logs that are too noisy for normal use. Can be
re-enabled by net.inet6.icmp6.nd6_debug.
 1.22.4.1 06-Apr-2001  he Pull up revision 1.25 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.28.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.28.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.28.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.28.2.5 18-Oct-2002  nathanw Catch up to -current.
 1.28.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.28.2.3 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.28.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.28.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.34.6.1 02-Oct-2003  tron Pull up revision 1.39 via patch (requested by itojun in ticket #1491):
sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.34.4.2 20-Jun-2002  gehenna catch up with -current.
 1.34.4.1 30-May-2002  gehenna Catch up with -current.
 1.47.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.47.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.47.2.5 15-Feb-2005  skrll Sync with HEAD.
 1.47.2.4 04-Feb-2005  skrll Sync with HEAD.
 1.47.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.47.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.47.2.1 03-Aug-2004  skrll Sync with HEAD
 1.53.10.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.53.10.1 12-Feb-2005  yamt sync with head.
 1.53.8.1 29-Apr-2005  kent sync with -current
 1.56.6.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.56.4.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.56.2.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1967).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.57.2.7 17-Mar-2008  yamt sync with head.
 1.57.2.6 07-Dec-2007  yamt sync with head
 1.57.2.5 15-Nov-2007  yamt sync with head.
 1.57.2.4 03-Sep-2007  yamt sync with head.
 1.57.2.3 26-Feb-2007  yamt sync with head.
 1.57.2.2 30-Dec-2006  yamt sync with head.
 1.57.2.1 21-Jun-2006  yamt sync with head.
 1.58.2.2 01-Mar-2006  yamt sync with head.
 1.58.2.1 01-Feb-2006  yamt sync with head.
 1.59.4.2 01-Jun-2006  kardel Sync with head.
 1.59.4.1 22-Apr-2006  simonb Sync with head.
 1.59.2.1 09-Sep-2006  rpaulo sync with head
 1.61.2.3 11-Aug-2006  yamt sync with head
 1.61.2.2 24-May-2006  yamt sync with head.
 1.61.2.1 13-Mar-2006  yamt sync with head.
 1.63.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.64.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.65.14.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1210).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.65.8.1 03-Oct-2008  jdc Pull up revisions:
src/sys/netinet6/in6.c 1.141 via patch
src/sys/netinet6/in6_var.h 1.59 via patch
src/sys/netinet6/nd6_nbr.c 1.89-1.90 via patch
(requested by adrianp in ticket #1210).

If a neighbor solictation isn't from the unspecified address, make sure
that the source address matches one of the interfaces address prefixes.

Generalize previous fix so that both NS and NA packets are checked.
 1.65.6.2 18-Dec-2006  yamt sync with head.
 1.65.6.1 10-Dec-2006  yamt sync with head.
 1.65.4.2 01-Feb-2007  ad Sync with head.
 1.65.4.1 12-Jan-2007  ad Sync with head.
 1.69.2.5 17-May-2007  yamt sync with head.
 1.69.2.4 07-May-2007  yamt sync with head.
 1.69.2.3 24-Mar-2007  yamt sync with head.
 1.69.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.69.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.71.6.1 18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.71.4.1 11-Jul-2007  mjf Sync with head.
 1.71.2.5 09-Oct-2007  ad Sync with head.
 1.71.2.4 20-Aug-2007  ad Sync with HEAD.
 1.71.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.71.2.2 08-Jun-2007  ad Sync with head.
 1.71.2.1 10-Apr-2007  ad Sync with head.
 1.76.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.76.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.77.4.4 21-Nov-2007  joerg Sync with HEAD.
 1.77.4.3 11-Nov-2007  joerg Sync with HEAD.
 1.77.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.77.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.78.2.2 07-Aug-2007  dyoung Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.78.2.1 07-Aug-2007  dyoung file nd6_nbr.c was added on branch matt-mips64 on 2007-08-07 04:35:44 +0000
 1.79.2.3 23-Mar-2008  matt sync with HEAD
 1.79.2.2 09-Jan-2008  matt sync with HEAD
 1.79.2.1 06-Nov-2007  matt sync with HEAD
 1.80.6.1 19-Nov-2007  mjf Sync with HEAD.
 1.80.4.2 18-Nov-2007  bouyer Sync with HEAD
 1.80.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.82.14.3 28-Sep-2008  mjf Sync with HEAD.
 1.82.14.2 02-Jun-2008  mjf Sync with HEAD.
 1.82.14.1 03-Apr-2008  mjf Sync with HEAD.
 1.82.10.2 24-Mar-2008  keiichi sync with head.
 1.82.10.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.85.2.2 04-Jun-2008  yamt sync with head
 1.85.2.1 18-May-2008  yamt sync with head.
 1.86.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.86.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.86.2.1 04-May-2009  yamt sync with head.
 1.88.4.1 19-Oct-2008  haad Sync with HEAD.
 1.90.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.90.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.94.16.2 05-Apr-2012  mrg sync to latest -current.
 1.94.16.1 18-Feb-2012  mrg merge to -current.
 1.94.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.94.12.1 17-Apr-2012  yamt sync with head
 1.95.8.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1525):
sys/netinet6/nd6_nbr.c: revision 1.145 (patch)

Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.95.6.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1525):
sys/netinet6/nd6_nbr.c: revision 1.145 (patch)

Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.95.2.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1525):
sys/netinet6/nd6_nbr.c: revision 1.145 (patch)

Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.96.2.3 03-Dec-2017  jdolecek update from HEAD
 1.96.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.96.2.1 23-Jun-2013  tls resync from head
 1.98.2.1 18-May-2014  rmind sync with head
 1.99.2.1 10-Aug-2014  tls Rebase.
 1.100.2.3 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1562):
sys/netinet6/nd6_nbr.c: revision 1.145
Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.100.2.2 06-Apr-2015  snj branches: 1.100.2.2.2; 1.100.2.2.6;
Pull up following revision(s) (requested by martin in ticket #655):
sys/netinet6/in6.c: revision 1.182 via patch
sys/netinet6/in6_ifattach.c: revision 1.95 via patch
sys/netinet6/nd6.c: revision 1.158 via patch
sys/netinet6/nd6.h: revision 1.62 via patch
sys/netinet6/nd6_nbr.c: revision 1.104 via patch
sys/netinet6/nd6_rtr.c: revision 1.96 via patch
Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.100.2.1 17-Dec-2014  martin Pull up following revision(s) (requested by roy in ticket #332):
sys/netinet6/nd6_nbr.c: revision 1.103
sys/netinet6/nd6_rtr.c: revision 1.95
sys/netinet6/nd6.h: revision 1.61
sys/netinet6/nd6.c: revision 1.156
Report route additions/changes/deletions for cached neighbours to userland.
 1.100.2.2.6.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1562):
sys/netinet6/nd6_nbr.c: revision 1.145 (patch)

Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.100.2.2.2.1 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1562):
sys/netinet6/nd6_nbr.c: revision 1.145 (patch)

Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.102.2.11 28-Aug-2017  skrll Sync with HEAD
 1.102.2.10 05-Feb-2017  skrll Sync with HEAD
 1.102.2.9 05-Dec-2016  skrll Sync with HEAD
 1.102.2.8 05-Oct-2016  skrll Sync with HEAD
 1.102.2.7 09-Jul-2016  skrll Sync with HEAD
 1.102.2.6 29-May-2016  skrll Sync with HEAD
 1.102.2.5 22-Apr-2016  skrll Sync with HEAD
 1.102.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.102.2.3 22-Sep-2015  skrll Sync with HEAD
 1.102.2.2 06-Jun-2015  skrll Sync with HEAD
 1.102.2.1 06-Apr-2015  skrll Sync with HEAD
 1.122.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.122.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.122.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.122.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.122.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.134.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.138.6.9 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1396):

sys/netinet6/nd6.h: revision 1.88
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.288 (patch)

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.138.6.8 23-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1383):

sys/netinet6/nd6_nbr.c: revision 1.173

nd6: remove extra pserialize_read_exit
 1.138.6.7 13-May-2019  martin Pull up following revision(s) (requested by roy in ticket #1262):

sys/netinet6/nd6_nbr.c: revision 1.163

inet6: discard any received NA with a LL address we own

This matches ARP behaviour.
 1.138.6.6 02-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #686):

sys/netinet/if_arp.c: revision 1.271
sys/netinet6/nd6_nbr.c: revision 1.151,1.152

Avoid passing NULL to nd6_dad_duplicated
Fix PR kern/53075

Fix a race condition on DAD destructions (again)

The previous fix to DAD timers was wrong; it avoided a use-after-free but
instead introduced a memory leak. The destruction method had delegated
a destruction of a DAD timer to the timer itself and told that by setting NULL
to dp->dad_ifa. However, the previous fix made DAD timers do nothing on
the sign.

Fixing the issue with using callout_stop isn't easy. One approach is to have
a refcount on dp but it introduces extra complexity that we want to avoid.
The new fix falls back to using callout_halt, which was abandoned because of
softnet_lock. Fortunately now the network stack is protected by KERNEL_LOCK
so we can remove softnet_lock from DAD timers (callout) and use callout_halt
safely.
 1.138.6.5 20-Mar-2018  bouyer Pull up following revision(s) (requested by ozaki-r in ticket #645):
sys/netinet6/nd6_nbr.c: revision 1.153
Pull out a sleepable function (in6_selectsrc) from a pserialize read section
 1.138.6.4 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #589):
sys/netinet/if_arp.c: revision 1.267
sys/netinet6/nd6_nbr.c: revision 1.146-1.148

Use KASSERT for checking a programming error

Simplify; pass dp to nd6_dad_duplicated instead of looking it up again in it

Avoid a race condition of DAD timer destructions

When we see dp->dad_ifa == NULL, it means that the ifa is being deleted and also
the callout is scheduled again by someone. We shouldn't rely on a result of
callout_pending to know if the callout is scheduled because it returns false if
the subsequent callout handler is already on the fly.
We have to always delegate the destruction of dp to the subsequent handler
unconditionally if dp->dad_ifa == NULL. Otherwise, the first handler destroys
the dp and the second handler tries to handle destroyed dp.
 1.138.6.3 02-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #531):
sys/netinet6/nd6_nbr.c: revision 1.145
Fix memory leak. Contrary to what the XXX indicates, this place is 100%
reachable remotely.
 1.138.6.2 26-Jan-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #511):
sys/kern/kern_timeout.c: revision 1.54
sys/netinet6/nd6_nbr.c: revision 1.141
sys/netinet6/nd6_nbr.c: revision 1.144
sys/netinet/if_arp.c: revision 1.256
Fix a deadlock on callout_halt of nd6_dad_timer
We must not call callout_halt of nd6_dad_timer with holding nd6_dad_lock because
the lock is taken in nd6_dad_timer. Once softnet_lock goes away, we can pass the
lock to callout_halt, but for now we cannot.
Make DAD destructions (MP-)safe with callout_stop
arp_dad_stoptimer and nd6_dad_stoptimer can be called with or without
softnet_lock held and unfortunately we have no easy way to statically know which.
So it is hard to use callout_halt there.
To address the situation, we use callout_stop to make the code safe. The new
approach copes with the issue by delegating the destruction of a callout to
callout itself, which allows us to not wait the callout to finish. This can be
done thanks to that DAD objects are separated from other data such as ifa.
The approach is suggested by riastradh@
Proposed on tech-kern@ and tech-net@
Sanity-check if interlock is held when it's passed
 1.138.6.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.148.2.5 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.148.2.4 21-May-2018  pgoyette Sync with HEAD
 1.148.2.3 02-May-2018  pgoyette Synch with HEAD
 1.148.2.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.148.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.156.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.156.2.1 10-Jun-2019  christos Sync with HEAD
 1.166.2.5 23-Apr-2020  martin Pull up following revision(s) (requested by roy in ticket #845):

sys/netinet6/nd6_nbr.c: revision 1.178

inet6: nd6_na_input() now considers ln_state <= ND6_LLINFO_INCOMPLETE

Otherwise if ln_state != ND6_LLINFO_INCOMPLETE and the is no lladdr
and this message was solicited then ln_state is set to ND6_LLINFO_REACHABLE
which could then cause a panic in nd6_resolve().

If ln_state > ND6_LLINFO_INCOMPLETE then it's assumed we have a lladdr.
Potentially this could have been triggered by the introduction of
ND6_LLINFO_PURGE in nd6.c r1.143 but also by the re-introduction of
ND6_LLINFO_INCOMPLETE in nd6.c r1.263.

Depending on the timing, it's technically possible to receive such
a message after the llentry is created with ND6_LLINFO_NOSTATE.
 1.166.2.4 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #269):

sys/netinet6/nd6.h: revision 1.88
sys/net/rtsock_shared.c: revision 1.10
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.283
sys/netinet/if_arp.c: revision 1.288

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.

-

Initialize dom_mowner for MBUFTRACE
 1.166.2.3 22-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #212):

sys/netinet6/nd6_nbr.c: revision 1.173

nd6: remove extra pserialize_read_exit
 1.166.2.2 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #168):

sys/net/rtsock.c: revision 1.252
sys/netinet6/nd6_nbr.c: revision 1.168 - 1.172
sys/netinet6/nd6.c: revision 1.262

inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.

This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted
(RTM_DELETED) or has failed to been resolved (RTM_MISS).

The latter case can be interpreted as unreachable.

inet6: change rt_announce and llchange to bool in nd6_na_input()
more bool
 1.166.2.1 26-Aug-2019  martin Pull up following revision(s) (requested by roy in ticket #109):

sys/net/route.h: revision 1.124
sys/netinet6/nd6.c: revision 1.258
sys/netinet6/nd6.c: revision 1.259
sys/net/rtsock.c: revision 1.251
sys/netinet/if_arp.c: revision 1.284
sys/netinet6/nd6_nbr.c: revision 1.167

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9

-

nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.175.2.1 25-Jan-2020  ad Sync with head.
 1.177.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.183.6.1 02-Aug-2025  perseant Sync with HEAD
 1.149 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.148 13-Apr-2020  kim Fix default route selection

The primary issue was that in revision 1.79 a check was added in the
nd6_defrouter_select() search loop to ignore the entry if RA processing
is enabled on its interface. In practice this results in all entries
being ignored.

This fix reverses the condition, so that an entry is ignored when RA
processing is NOT enabled on its interface. Further, the entry is
only ignored for being selected as the default router. The currently
installed router must be identified regardless of the (current) status
of its interface, so that we can delete the route before installing a
new one.

I also added error logging when adding or deleting a route fails. This
should help the administrator (or kernel developer) in noticing possible
problems.

Finally, if deleting a route fails, the corresponding default route
entry no longer has its "installed" flag cleared, so that deletion will
be retried. At a minimum, this will cause repeated messages about the
failed deletion as opposed to only getting repeated messages about the
installation of a new default route failing.

Fixes PR kern/55091 and also PR bin/54997 as far as the behaviour
observed with ndp(8).
 1.147 27-Dec-2019  msaitoh branches: 1.147.6;
s/referece/reference/ in comment.
 1.146 25-Sep-2019  ozaki-r Make panic messages more informative
 1.145 29-Apr-2019  roy branches: 1.145.2;
rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.144 14-Aug-2018  ozaki-r Don't call find_pfxlist_reachable_router, which may sleep, in a pserialize read section

Found by knakahara@
 1.143 19-May-2018  maxv branches: 1.143.2;
Style.
 1.142 18-May-2018  maxv Add missing m_put_rcvif_psref.
 1.141 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.140 24-Apr-2018  maxv Remove nullcheck, m is not allowed to be null.
 1.139 24-Apr-2018  maxv Remove the M_AUTHIPDGM flag. It is equivalent to M_AUTHIPHDR, both
are set in IPsec-AH, and they are always handled together.
 1.138 26-Jan-2018  ozaki-r branches: 1.138.2;
Get rid of unnecessary splsoftnet (redo)

Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
 1.137 26-Jan-2018  ozaki-r Revert "Get rid of unnecessary splsoftnet" (v1.133)

It's not always true that softnet_lock is held these places.
See PR kern/52947.
 1.136 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.135 14-Mar-2017  ozaki-r branches: 1.135.6;
Remove unnecessary NULL check
 1.134 03-Mar-2017  msaitoh Add missing opt_net_mpsafe.h.
 1.133 22-Feb-2017  ozaki-r Get rid of unnecessary splsoftnet
 1.132 22-Feb-2017  ozaki-r Fix prefix invalidation via nd6_timer

We cannot remove a prefix there. Instead just invalidate it; the prefix
will be removed when purging an associated address. This is the same as
the original behavior.
 1.131 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.130 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.129 04-Jan-2017  christos branches: 1.129.2;
- kill NULL argument from in6_update_ifa
- amend in6_update_ifa1 to return the ia, so that we can use it in pfil hooks
to avoid NULL pointer crash.
 1.128 19-Dec-2016  ozaki-r Protect IPv6 default router and prefix lists with coarse-grained rwlock

in6_purgeaddr (in6_unlink_ifa) itself unrefernces a prefix entry and calls
nd6_prelist_remove if the counter becomes 0, so callers doesn't need to
handle the reference counting.

Performance-sensitive paths (sending/forwarding packets) call just one
reader lock. This is a trade-off between performance impact vs. the amount
of efforts; if we want to remove the reader lock, we need huge amount of
works including destroying objects with psz/psref in softint, for example.
 1.127 14-Dec-2016  ozaki-r Reduce return points

No functional change intended.
 1.126 14-Dec-2016  ozaki-r Use macro to iterate on the nd_prefix list
 1.125 12-Dec-2016  ozaki-r Introduce macros for the prefix list

No functional change.
 1.124 12-Dec-2016  ozaki-r Introduce macros for the default router list

No functional change.
 1.123 11-Dec-2016  ozaki-r Add nd6_ prefix to exported functions
 1.122 11-Dec-2016  ozaki-r Move default interface things from nd6_rtr.c to nd6.c
 1.121 11-Dec-2016  ozaki-r Make some functions static
 1.120 15-Nov-2016  ozaki-r Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
 1.119 16-Aug-2016  roy Separate ioctl address prefix management from RA prefix management
as we have no API for controlling the latter.

This fixes a long standing problem where addresses added with non /128
prefixes and non infinte address lifetimes would register a prefix route
which would expire. Subsequent calls set new lifetimes for the same address
would not affect the prefix route management, so once expired, the
prefix route would be impossible to add back as the kernel would remove it.
 1.118 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.117 20-Jul-2016  ozaki-r Apply pserialize to some iterations of IP address lists
 1.116 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.115 07-Jul-2016  ozaki-r branches: 1.115.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.114 05-Jul-2016  ozaki-r Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.
 1.113 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.112 15-Jun-2016  ozaki-r Protect if_byindex by pserialize
 1.111 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.110 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.109 11-Apr-2016  ozaki-r Sweep unncessary radix.h inclusions
 1.108 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.107 01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.106 01-Apr-2016  ozaki-r Use __func__ in log messages
 1.105 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.104 05-Oct-2015  ozaki-r Use satosin6 instead of its own macro
 1.103 24-Aug-2015  ozaki-r Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)
 1.102 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.101 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.100 30-Jun-2015  ozaki-r Fix nd6_numroutes counting

nd6_numroutes is intended to be incremented when a route is added via RA
and decremented when a RA route is deleted. However, a decrement of a RA
route was skipped when there remained references to the RA route.
 1.99 02-May-2015  roy Mitigate Local Denial of Service with IPv6 Router Advertisements and
log attack attempts.

Fixes CVE-2015-2923, taken from FreeBSD.
 1.98 25-Feb-2015  roy Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.
 1.97 25-Feb-2015  roy Retire nd6_newaddrmsg and use rt_newaddrmsg directly instead so that
we don't spam route changes when the route hasn't changed.
 1.96 23-Feb-2015  martin Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.95 16-Dec-2014  roy Report route additions/changes/deletions for cached neighbours to userland.
 1.94 05-Sep-2014  matt branches: 1.94.2;
Don't use C++ keyword as variable.
Use different prefix for nd6_prefixctl members than for nd6_prefix members.
 1.93 31-Jul-2014  ozaki-r branches: 1.93.2;
Define IFADDR_FOREACH_SAFE for on-the-fly element removal in a loop

We have to use it when we purge an address element in an ifaddr loop.

This change restores the original behavior that was accidentally degraded.
 1.92 25-Jul-2014  ozaki-r Use IFADDR_FOREACH for iterating if_addrlist of ifnet
 1.91 17-May-2014  rmind Replace open-coded access (and boundary checking) of ifindex2ifnet with
if_byindex() function.
 1.90 14-Sep-2013  martin branches: 1.90.2;
Remove unused variable
 1.89 20-Jun-2013  roy branches: 1.89.2;
Move the detaching and making tentative addresses out if in6_if_up
and into in6_if_link_up.

This fixes a possible panic where link is up but not the interface.
Note that a better solution would be to listen to the routing socket
in the kernel, but I don't know how to do that.

Reachable Router tests for IFF_UP as well.
 1.88 11-Jun-2013  roy When an interface link state changes to down, mark all attached IPv6
addresses as detached.
Likewise, when the link state changes to up, mark all detached IPv6
as tentative and start DAD on them.

Advertised router reachability now checks that link state is not down.
This means that when an interface link state changes, the default IPv6
router may change as well.
 1.87 21-May-2013  roy For IPv6, emit RTM_NEWADDR once DAD completes and also when address flag
changes. Tentative addresses are not emitted.

Version bumped so userland can detect this behaviour change.
 1.86 18-Feb-2013  christos PR/47576: Takahiro HAYASHI: Avoid crash destroying tap0 after deleting
it's link-local address.
 1.85 28-Jan-2013  joerg Set the socket family for the network mask.
 1.84 25-Jun-2012  abs branches: 1.84.2;
Some fun in trying to work out what was broken with gcc-4.1 to
trigger the following warning when gcc-4.5 was silent:
nd6_rtr.c: In function 'nd6_ra_input':
nd6_rtr.c:788: warning: 'ext' may be used uninitialized in this function
Eventually determined that it was not unreasonable for gcc-4.1 to
bleat in this case as there is a nasty 'goto insert' which could
indeed have resulted in an uninitialised variable use. Yay gcc 4.1.
 1.83 23-Jun-2012  christos 4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.82 19-Nov-2011  tls branches: 1.82.4; 1.82.8; 1.82.10;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.81 24-May-2011  spz branches: 1.81.4;
RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.80 06-Nov-2009  dyoung branches: 1.80.4; 1.80.6;
Fix net.inet6.ip6.accept_rtadv and 'ndp -i <interface> accept_rtadv':

Add a flag ND6_IFF_OVERRIDE_RTADV that tells the kernel to override
ip6_accept_rtadv (net.inet6.ip6.accept_rtadv) on an interface.

Add a routine nd6_accepts_rtadv(ndi) that evaluates both the flags
on the interface represented by ndi and ip6_accept_rtadv, and
returns 'true' if the given interface should accept Router
Advertisements, and 'false' if not.

Now, ND6_IFF_ACCEPT_RTADV works as it was historically documented:
if it is set, then accept router advertisements iff ip6_accept_rtadv
!= 0. Otherwise, do not accept router advertisements.

If ND6_IFF_OVERRIDE_RTADV is set, then the flag ND6_IFF_ACCEPT_RTADV
overrides ip6_accept_rtadv: if ND6_IFF_ACCEPT_RTADV is set, accept;
otherwise reject. Ignore ip6_accept_rtadv.

If neither ND6_IFF_ACCEPT_RTADV nor ND6_IFF_OVERRIDE_RTADV is set,
reject Router Advertisements.
 1.79 25-Jul-2009  tonnerre Instead of using the net.inet6.ip6.accept_rtadv sysctl for all devices,
make net.inet6.ip6.accept_rtadv the default for individual per-device
settings so people can use the ndp(8) utility to set per-device whether
or not to accept route advertisements.

rtadvd changes to follow.

(Debated on tech-net@ before but almost two weeks passed by without any
comment on the patch.)
 1.78 18-Mar-2009  cegger bzero -> memset
 1.77 19-Dec-2008  cegger branches: 1.77.2;
use M_ZERO on malloc() and remove subsequent bzero().
 1.76 24-Oct-2008  dyoung branches: 1.76.2;
bzero -> memset. Avoid some messy casts to sockaddr by using a
union of sockaddr_in6 and sockaddr. No functional change intended.
 1.75 15-Apr-2008  thorpej branches: 1.75.4; 1.75.10;
Make ip6 and icmp6 stats per-cpu.
 1.74 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.73 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.72 20-Dec-2007  dyoung branches: 1.72.2; 1.72.6;
Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.71 05-Dec-2007  dyoung branches: 1.71.4;
Use IFADDR_FIRST(), IFADDR_NEXT().
 1.70 04-Dec-2007  dyoung Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.69 10-Nov-2007  dyoung branches: 1.69.2;
Use sockaddr_in6_init().
 1.68 01-Nov-2007  dyoung branches: 1.68.2;
De-__P().
 1.67 07-Aug-2007  dyoung branches: 1.67.2; 1.67.6; 1.67.8;
Remove dead code.
 1.66 19-Jul-2007  dyoung branches: 1.66.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.65 09-Jun-2007  dyoung branches: 1.65.2;
Convert from rn_walktree() to rt_walktree(). While I am here,
de-__P().
 1.64 23-May-2007  christos Ansify + add a few comments, from Karl Sjödahl
 1.63 04-Mar-2007  christos branches: 1.63.2; 1.63.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.62 20-Nov-2006  dyoung branches: 1.62.4;
Use LIST_/TAILQ_ macros, esp. LIST_FOREACH() and TAILQ_FOREACH().
Use the usual idiom for iterating over a list where we might
_REMOVE() entries,

for (x = TAILQ_FIRST(...); x != NULL; x = nx) {
nx = TAILQ_NEXT(x, ...);
...
}
 1.61 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.60 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.59 07-Jun-2006  kardel branches: 1.59.6; 1.59.8;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.58 20-Mar-2006  rpaulo branches: 1.58.2;
RFC 4191 changed the meaning of the "Reserved" Router Preference
value. Previously the router should treat the recieved router
advertisement as having a 0 router lifetime. The RFC now says that the
router should treat the "Reserved" field the same way as if it was the
medium (default) preference.

From the KAME project via SUZUKI Shinsuke.
 1.57 06-Mar-2006  rpaulo branches: 1.57.2; 1.57.4;
Rename local variables called delay that shadow the delay() decl.
Pointed out by Robert Swindells.
 1.56 05-Mar-2006  rpaulo NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.55 03-Mar-2006  rpaulo branches: 1.55.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.54 21-Jan-2006  rpaulo branches: 1.54.2; 1.54.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.53 11-Dec-2005  christos branches: 1.53.2;
merge ktrace-lwp.
 1.52 29-May-2005  christos branches: 1.52.2;
- avoid shadowed variables
- sprinkle const.
 1.51 17-Nov-2004  itojun wrong paren. Patrick Latifi
 1.50 26-Oct-2004  itojun no need to call defrouter_select() here any more; jinmei
 1.49 26-Oct-2004  itojun more cleanup on onlink assumption; jinmei
 1.48 26-Oct-2004  itojun remove onlink assumption behavior (consider destination on-link if default
router list is empty) based on recent IETF ipv6 discussion (RFC2461 5.2).

fix "ndp -I delete".
 1.47 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.46 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.45 26-Sep-2003  wiz Process has only one c. From miod@openbsd.
 1.44 24-Jun-2003  itojun branches: 1.44.2;
remove unneeded checks of accept_rtadv. from kame
 1.43 24-Jun-2003  itojun use time.tv_sec directly
 1.42 16-May-2003  itojun backout previous. (sys/net/if.c fixed)
 1.41 16-May-2003  itojun nd6_rtmsg: If called during if_detach(), TAILQ_FIRST(if_addrlist)
could be NULL. This is not a common case, but as nd6_rtmsg()
will be called during if_detach(), we need to check for the
case. reported by kanaoka-san.
 1.40 16-May-2003  itojun remove duplicate. masanori kanaoka
 1.39 15-May-2003  itojun rt->rt_ifp may not always be available. masanori kanaoka via kame
 1.38 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.37 08-May-2003  itojun fix invalid pointer setting on RA reception. from kiu shueng chuan via kame
 1.36 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.35 30-Jul-2002  itojun no need to handle NULL argument in defrouter_delreq.
From: tedu <grendel@zeitbombe.org>
 1.34 13-Jul-2002  itojun no need to bzero() twice. from he@netbsd
 1.33 09-Jun-2002  itojun whitespace cleanup
 1.32 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.31 08-Jun-2002  itojun in6_len2mask is a duplicate of in6_prefixlen2mask. unify. sync w/kame
 1.30 07-Jun-2002  itojun cope with ndi->maxmtu == 0 case. sync w/kame
 1.29 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.28 18-Dec-2001  itojun branches: 1.28.8;
reduce white space/cosmetic diffs w/kame.
 1.27 13-Nov-2001  lukem add RCSIDs
 1.26 17-Oct-2001  itojun do not change neighbor cache state on entry timeout,
if the cache entry is for outgoing router.

perform on-linkness check before default router (re-)seletion.

do not play with interface direct route on nd6_rtrequest.

sync a lot of cosmetic changes. sync with kame
 1.25 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.24 24-May-2001  itojun branches: 1.24.2;
print more diag message on in6_addmulti() failures.
 1.23 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.22 04-Apr-2001  itojun suppress RS/RA log messages (can be re-enabled by net.inet6.icmp6.nd6_debug),
as they may fill up /var. sync with kame.
 1.21 11-Feb-2001  itojun branches: 1.21.2;
protect router list management by splsoftnet properly. sync with kame
 1.20 07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.19 17-Jan-2001  itojun wrap noisy ND6 debugging messages with ND6_DEBUG. sync with kame
 1.18 13-Aug-2000  itojun supress warning (LOG_ERR -> LOG_DEBUG) which occurs in the following situation:
- manually configure an address from prefix P (like P::1)
- autoconfigure additional address from the same prefix P (like P::ifid).
- rtrequest fails due to P/plen already exists

more fundamental solution should appear later, when kame side stablizes it.
from thopej.
 1.17 13-Jun-2000  itojun branches: 1.17.2;
add sanity check on in6_ifaddr.
 1.16 13-Jun-2000  itojun make sure to link new in6_ifaddr to if_addrlist.
 1.15 21-Mar-2000  itojun branches: 1.15.2;
s/ND6DEBUG/ND6_DEBUG/ (just to meet nd6_nbr.c)
 1.14 04-Mar-2000  thorpej Quiet a noisy diagnostic.
 1.13 02-Mar-2000  itojun don't configure ifa_dstaddr for non-pointopoint interface,
so that we won't be returning them from routing socket manipulation.
 1.12 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.11 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.10 03-Feb-2000  itojun remove old #if 0'ed portion
 1.9 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.8 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.7 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.6 31-Jul-1999  itojun branches: 1.6.2; 1.6.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.5 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file nd6_rtr.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file nd6_rtr.c was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.6.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.6.2.3 21-Apr-2001  bouyer Sync with HEAD
 1.6.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.15.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.17.2.1 09-May-2001  he Pull up revision 1.20 (via patch, requested by itojun):
Suppress ND6 logs that are too noisy for normal use. Can be
re-enabled by net.inet6.icmp6.nd6_debug.
 1.21.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.21.2.7 01-Aug-2002  nathanw Catch up to -current.
 1.21.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.21.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.21.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.21.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.21.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.21.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.24.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.24.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.24.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.24.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.28.8.4 29-Aug-2002  gehenna catch up with -current.
 1.28.8.3 15-Jul-2002  gehenna catch up with -current.
 1.28.8.2 20-Jun-2002  gehenna catch up with -current.
 1.28.8.1 30-May-2002  gehenna Catch up with -current.
 1.44.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.44.2.5 29-Nov-2004  skrll Sync with HEAD.
 1.44.2.4 02-Nov-2004  skrll Sync with HEAD.
 1.44.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.44.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.44.2.1 03-Aug-2004  skrll Sync with HEAD
 1.52.2.7 17-Mar-2008  yamt sync with head.
 1.52.2.6 21-Jan-2008  yamt sync with head
 1.52.2.5 07-Dec-2007  yamt sync with head
 1.52.2.4 15-Nov-2007  yamt sync with head.
 1.52.2.3 03-Sep-2007  yamt sync with head.
 1.52.2.2 30-Dec-2006  yamt sync with head.
 1.52.2.1 21-Jun-2006  yamt sync with head.
 1.53.2.1 01-Feb-2006  yamt sync with head.
 1.54.4.3 22-Apr-2006  simonb Update for timecounters - use getnanotime() and time_second variable.
 1.54.4.2 22-Apr-2006  simonb Sync with head.
 1.54.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.54.2.1 09-Sep-2006  rpaulo sync with head
 1.55.2.3 26-Jun-2006  yamt sync with head.
 1.55.2.2 01-Apr-2006  yamt sync with head.
 1.55.2.1 13-Mar-2006  yamt sync with head.
 1.57.4.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.57.2.1 19-Apr-2006  elad sync with head.
 1.58.2.1 19-Jun-2006  chap Sync with head.
 1.59.8.2 10-Dec-2006  yamt sync with head.
 1.59.8.1 22-Oct-2006  yamt sync with head
 1.59.6.2 12-Jan-2007  ad Sync with head.
 1.59.6.1 18-Nov-2006  ad Sync with head.
 1.62.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.63.4.1 11-Jul-2007  mjf Sync with head.
 1.63.2.3 20-Aug-2007  ad Sync with HEAD.
 1.63.2.2 15-Jul-2007  ad Sync with head.
 1.63.2.1 08-Jun-2007  ad Sync with head.
 1.65.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.66.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.66.4.3 11-Nov-2007  joerg Sync with HEAD.
 1.66.4.2 04-Nov-2007  jmcneill Sync with HEAD.
 1.66.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.67.8.2 07-Aug-2007  dyoung Remove dead code.
 1.67.8.1 07-Aug-2007  dyoung file nd6_rtr.c was added on branch matt-mips64 on 2007-08-07 02:17:22 +0000
 1.67.6.1 13-Nov-2007  bouyer Sync with HEAD
 1.67.2.3 23-Mar-2008  matt sync with HEAD
 1.67.2.2 09-Jan-2008  matt sync with HEAD
 1.67.2.1 06-Nov-2007  matt sync with HEAD
 1.68.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.68.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.68.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.69.2.2 26-Dec-2007  ad Sync with head.
 1.69.2.1 08-Dec-2007  ad Sync with head.
 1.71.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.72.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.72.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.72.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.72.2.2 24-Mar-2008  keiichi sync with head.
 1.72.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.75.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.75.4.3 11-Mar-2010  yamt sync with head
 1.75.4.2 19-Aug-2009  yamt sync with head.
 1.75.4.1 04-May-2009  yamt sync with head.
 1.76.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.76.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.77.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.80.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.80.4.1 31-May-2011  rmind sync with head
 1.81.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.81.4.2 30-Oct-2012  yamt sync with head
 1.81.4.1 17-Apr-2012  yamt sync with head
 1.82.10.3 08-Aug-2013  snj Pull up following revision(s) (requested by msaitoh in ticket #926):
sys/netinet6/nd6_rtr.c: revision 1.86
PR/47576: Takahiro HAYASHI: Avoid crash destroying tap0 after deleting
it's link-local address.
 1.82.10.2 12-Jul-2013  jdc Pull up revision 1.84 via patch to fix gcc 4.1 compilation error
(uninitialised variable):

Some fun in trying to work out what was broken with gcc-4.1 to
trigger the following warning when gcc-4.5 was silent:
nd6_rtr.c: In function 'nd6_ra_input':
nd6_rtr.c:788: warning: 'ext' may be used uninitialized in this function
Eventually determined that it was not unreasonable for gcc-4.1 to
bleat in this case as there is a nasty 'goto insert' which could
indeed have resulted in an uninitialised variable use. Yay gcc 4.1.
 1.82.10.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.82.8.3 08-Aug-2013  snj Pull up following revision(s) (requested by msaitoh in ticket #926):
sys/netinet6/nd6_rtr.c: revision 1.86
PR/47576: Takahiro HAYASHI: Avoid crash destroying tap0 after deleting
it's link-local address.
 1.82.8.2 12-Jul-2013  jdc Pull up revision 1.84 via patch to fix gcc 4.1 compilation error
(uninitialised variable):

Some fun in trying to work out what was broken with gcc-4.1 to
trigger the following warning when gcc-4.5 was silent:
nd6_rtr.c: In function 'nd6_ra_input':
nd6_rtr.c:788: warning: 'ext' may be used uninitialized in this function
Eventually determined that it was not unreasonable for gcc-4.1 to
bleat in this case as there is a nasty 'goto insert' which could
indeed have resulted in an uninitialised variable use. Yay gcc 4.1.
 1.82.8.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.82.4.3 08-Aug-2013  snj Pull up following revision(s) (requested by msaitoh in ticket #926):
sys/netinet6/nd6_rtr.c: revision 1.86
PR/47576: Takahiro HAYASHI: Avoid crash destroying tap0 after deleting
it's link-local address.
 1.82.4.2 12-Jul-2013  jdc Pull up revision 1.84 via patch to fix gcc 4.1 compilation error
(uninitialised variable):

Some fun in trying to work out what was broken with gcc-4.1 to
trigger the following warning when gcc-4.5 was silent:
nd6_rtr.c: In function 'nd6_ra_input':
nd6_rtr.c:788: warning: 'ext' may be used uninitialized in this function
Eventually determined that it was not unreasonable for gcc-4.1 to
bleat in this case as there is a nasty 'goto insert' which could
indeed have resulted in an uninitialised variable use. Yay gcc 4.1.
 1.82.4.1 08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.84.2.4 03-Dec-2017  jdolecek update from HEAD
 1.84.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.84.2.2 23-Jun-2013  tls resync from head
 1.84.2.1 25-Feb-2013  tls resync with head
 1.89.2.2 18-May-2014  rmind sync with head
 1.89.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.90.2.1 10-Aug-2014  tls Rebase.
 1.93.2.4 15-Apr-2020  martin Pull up following revision(s) (requested by kim in ticket #1727):

sys/netinet6/nd6_rtr.c: revision 1.148 (via patch)

Fix default route selection

The primary issue was that in revision 1.79 a check was added in the
nd6_defrouter_select() search loop to ignore the entry if RA processing
is enabled on its interface. In practice this results in all entries
being ignored.

This fix reverses the condition, so that an entry is ignored when RA
processing is NOT enabled on its interface. Further, the entry is
only ignored for being selected as the default router. The currently
installed router must be identified regardless of the (current) status
of its interface, so that we can delete the route before installing a
new one.

I also added error logging when adding or deleting a route fails. This
should help the administrator (or kernel developer) in noticing possible
problems.

Finally, if deleting a route fails, the corresponding default route
entry no longer has its "installed" flag cleared, so that deletion will
be retried. At a minimum, this will cause repeated messages about the
failed deletion as opposed to only getting repeated messages about the
installation of a new default route failing.

Fixes PR kern/55091 and also PR bin/54997 as far as the behaviour
observed with ndp(8).
 1.93.2.3 02-May-2015  martin branches: 1.93.2.3.2; 1.93.2.3.6;
Pull up following revision(s) (requested by roy in ticket #731):
sys/netinet6/nd6_rtr.c: revision 1.99
Mitigate Local Denial of Service with IPv6 Router Advertisements and
log attack attempts.
Fixes CVE-2015-2923, taken from FreeBSD.
 1.93.2.2 06-Apr-2015  snj Pull up following revision(s) (requested by martin in ticket #655):
sys/netinet6/in6.c: revision 1.182 via patch
sys/netinet6/in6_ifattach.c: revision 1.95 via patch
sys/netinet6/nd6.c: revision 1.158 via patch
sys/netinet6/nd6.h: revision 1.62 via patch
sys/netinet6/nd6_nbr.c: revision 1.104 via patch
sys/netinet6/nd6_rtr.c: revision 1.96 via patch
Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.93.2.1 17-Dec-2014  martin Pull up following revision(s) (requested by roy in ticket #332):
sys/netinet6/nd6_nbr.c: revision 1.103
sys/netinet6/nd6_rtr.c: revision 1.95
sys/netinet6/nd6.h: revision 1.61
sys/netinet6/nd6.c: revision 1.156
Report route additions/changes/deletions for cached neighbours to userland.
 1.93.2.3.6.1 15-Apr-2020  martin Pull up following revision(s) (requested by kim in ticket #1727):

sys/netinet6/nd6_rtr.c: revision 1.148 (via patch)

Fix default route selection

The primary issue was that in revision 1.79 a check was added in the
nd6_defrouter_select() search loop to ignore the entry if RA processing
is enabled on its interface. In practice this results in all entries
being ignored.

This fix reverses the condition, so that an entry is ignored when RA
processing is NOT enabled on its interface. Further, the entry is
only ignored for being selected as the default router. The currently
installed router must be identified regardless of the (current) status
of its interface, so that we can delete the route before installing a
new one.

I also added error logging when adding or deleting a route fails. This
should help the administrator (or kernel developer) in noticing possible
problems.

Finally, if deleting a route fails, the corresponding default route
entry no longer has its "installed" flag cleared, so that deletion will
be retried. At a minimum, this will cause repeated messages about the
failed deletion as opposed to only getting repeated messages about the
installation of a new default route failing.

Fixes PR kern/55091 and also PR bin/54997 as far as the behaviour
observed with ndp(8).
 1.93.2.3.2.1 15-Apr-2020  martin Pull up following revision(s) (requested by kim in ticket #1727):

sys/netinet6/nd6_rtr.c: revision 1.148 (via patch)

Fix default route selection

The primary issue was that in revision 1.79 a check was added in the
nd6_defrouter_select() search loop to ignore the entry if RA processing
is enabled on its interface. In practice this results in all entries
being ignored.

This fix reverses the condition, so that an entry is ignored when RA
processing is NOT enabled on its interface. Further, the entry is
only ignored for being selected as the default router. The currently
installed router must be identified regardless of the (current) status
of its interface, so that we can delete the route before installing a
new one.

I also added error logging when adding or deleting a route fails. This
should help the administrator (or kernel developer) in noticing possible
problems.

Finally, if deleting a route fails, the corresponding default route
entry no longer has its "installed" flag cleared, so that deletion will
be retried. At a minimum, this will cause repeated messages about the
failed deletion as opposed to only getting repeated messages about the
installation of a new default route failing.

Fixes PR kern/55091 and also PR bin/54997 as far as the behaviour
observed with ndp(8).
 1.94.2.11 28-Aug-2017  skrll Sync with HEAD
 1.94.2.10 05-Feb-2017  skrll Sync with HEAD
 1.94.2.9 05-Dec-2016  skrll Sync with HEAD
 1.94.2.8 05-Oct-2016  skrll Sync with HEAD
 1.94.2.7 09-Jul-2016  skrll Sync with HEAD
 1.94.2.6 29-May-2016  skrll Sync with HEAD
 1.94.2.5 22-Apr-2016  skrll Sync with HEAD
 1.94.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.94.2.3 22-Sep-2015  skrll Sync with HEAD
 1.94.2.2 06-Jun-2015  skrll Sync with HEAD
 1.94.2.1 06-Apr-2015  skrll Sync with HEAD
 1.115.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.115.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.115.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.115.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.129.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.135.6.5 15-Apr-2020  martin Pull up following revision(s) (requested by kim in ticket #1531):

sys/netinet6/nd6_rtr.c: revision 1.148

Fix default route selection

The primary issue was that in revision 1.79 a check was added in the
nd6_defrouter_select() search loop to ignore the entry if RA processing
is enabled on its interface. In practice this results in all entries
being ignored.

This fix reverses the condition, so that an entry is ignored when RA
processing is NOT enabled on its interface. Further, the entry is
only ignored for being selected as the default router. The currently
installed router must be identified regardless of the (current) status
of its interface, so that we can delete the route before installing a
new one.

I also added error logging when adding or deleting a route fails. This
should help the administrator (or kernel developer) in noticing possible
problems.

Finally, if deleting a route fails, the corresponding default route
entry no longer has its "installed" flag cleared, so that deletion will
be retried. At a minimum, this will cause repeated messages about the
failed deletion as opposed to only getting repeated messages about the
installation of a new default route failing.

Fixes PR kern/55091 and also PR bin/54997 as far as the behaviour
observed with ndp(8).
 1.135.6.4 15-Aug-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #975):

sys/netinet6/nd6_rtr.c: revision 1.144

Don't call find_pfxlist_reachable_router, which may sleep, in a
pserialize read section

Found by knakahara@
 1.135.6.3 22-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #830):

sys/netinet6/nd6_rtr.c: revision 1.142

Add missing m_put_rcvif_psref.
 1.135.6.2 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.135.6.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.138.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.138.2.2 21-May-2018  pgoyette Sync with HEAD
 1.138.2.1 02-May-2018  pgoyette Synch with HEAD
 1.143.2.4 21-Apr-2020  martin Sync with HEAD
 1.143.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.143.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.143.2.1 10-Jun-2019  christos Sync with HEAD
 1.145.2.1 14-Apr-2020  martin Pull up following revision(s) (requested by kim in ticket #834):

sys/netinet6/nd6_rtr.c: revision 1.148

Fix default route selection

The primary issue was that in revision 1.79 a check was added in the
nd6_defrouter_select() search loop to ignore the entry if RA processing
is enabled on its interface. In practice this results in all entries
being ignored.

This fix reverses the condition, so that an entry is ignored when RA
processing is NOT enabled on its interface. Further, the entry is
only ignored for being selected as the default router. The currently
installed router must be identified regardless of the (current) status
of its interface, so that we can delete the route before installing a
new one.

I also added error logging when adding or deleting a route fails. This
should help the administrator (or kernel developer) in noticing possible
problems.

Finally, if deleting a route fails, the corresponding default route
entry no longer has its "installed" flag cleared, so that deletion will
be retried. At a minimum, this will cause repeated messages about the
failed deletion as opposed to only getting repeated messages about the
installation of a new default route failing.

Fixes PR kern/55091 and also PR bin/54997 as far as the behaviour
observed with ndp(8).
 1.147.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.5 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.4 10-Feb-2001  itojun branches: 1.4.24; 1.4.40;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.3 03-Jul-1999  thorpej branches: 1.3.2;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file pim6.h was initially added on branch kame.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file pim6.h was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.3.2.1 11-Feb-2001  bouyer Sync with HEAD.
 1.4.40.1 21-Jun-2006  yamt sync with head.
 1.4.24.1 11-Dec-2005  christos Sync with head.
 1.15 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.14 15-Apr-2008  thorpej branches: 1.14.90; 1.14.92;
Make pim6 stats per-cpu.
 1.13 10-Dec-2005  elad branches: 1.13.70;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.12 28-Aug-2005  rpaulo Implement net.inet6.pim6.stats sysctl.

Reviewed by Elad Efrat.
 1.11 04-Sep-2004  manu branches: 1.11.12;
IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.10 23-Sep-2002  simonb branches: 1.10.6;
Remove an extern declaration for the "pim6stat" variable; the only other
occurance of this is a static variable in ip6_mroute.c.
 1.9 10-Feb-2001  itojun branches: 1.9.2; 1.9.4;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.8 07-Jun-2000  itojun s/PIMCTL/PIM6CTL/ to avoid future confusion.
 1.7 06-Jan-2000  itojun branches: 1.7.2;
remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.6 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.5 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.4 06-Jul-1999  itojun branches: 1.4.2; 1.4.8;
sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file pim6_var.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file pim6_var.h was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.4.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.9.4.1 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.9.2.1 18-Oct-2002  nathanw Catch up to -current.
 1.10.6.4 11-Dec-2005  christos Sync with head.
 1.10.6.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.10.6.2 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.6.1 18-Sep-2004  skrll Sync with HEAD.
 1.11.12.1 21-Jun-2006  yamt sync with head.
 1.13.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.14.92.1 10-Jun-2019  christos Sync with HEAD
 1.14.90.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.185 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.184 24-Feb-2024  mlelstv branches: 1.184.2;
Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.183 22-Mar-2023  ozaki-r in6: make sure a user-specified checksum field is within a packet

From OpenBSD
 1.182 04-Nov-2022  ozaki-r branches: 1.182.2;
inpcb: rename functions to in6pcb_*
 1.181 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.180 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.179 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.178 28-May-2022  andvar fix various typos, mainly in comments.
 1.177 23-Feb-2022  andvar fix various typos in comments, mainly immediatly/immediately/,
as well shared and recently fixed typos in OpenBSD code by Jonathan Grey.
 1.176 21-Sep-2021  christos don't opencode kauth_cred_get()
 1.175 25-Feb-2019  maxv branches: 1.175.4;
RIP6, CAN, SCTP and SCTP6 lack a length check in their _send() functions.
Fix RIP6 and CAN, add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+0b9692ae0f49f93b7dc7@syzkaller.appspotmail.com
 1.174 24-Feb-2019  maxv RIP, RIP6, DDP, SCTP and SCTP6 lack a length check in their _connect()
functions. Fix the first three, and add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+9eaf98dad6ca738c250d@syzkaller.appspotmail.com
 1.173 28-Jan-2019  martin Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.172 11-May-2018  maxv branches: 1.172.2;
Dedup: introduce rip6_sbappendaddr. Same as IPv4.
 1.171 29-Apr-2018  maxv Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.
 1.170 28-Apr-2018  maxv Remove unused ipsec_var.h includes.
 1.169 26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.168 12-Apr-2018  maxv Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.
 1.167 12-Apr-2018  maxv Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.
 1.166 21-Mar-2018  roy Sprinkle more soroverflow().
 1.165 28-Feb-2018  maxv branches: 1.165.2;
Remove unused ipsec_private.h includes.
 1.164 26-Feb-2018  maxv Remove redundant condition (harmless). PR/53030.
 1.163 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.162 08-Feb-2018  maxv Remove the IN6_IS_ADDR_V4MAPPED checks in the protocol functions. They
are useless, because the IPv6 entry point (ip6_input) already performs
them.

The checks were first added in the protocol functions:

Wed Dec 22 04:03:02 1999 UTC (18 years, 1 month ago) by itojun

"drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)"

Shortly afterwards they were also added in the IPv6 entry point, but
where not removed from the protocol functions:

Mon Jan 31 10:33:22 2000 UTC (18 years ago) by itojun

"be proactive about malicious packet on the wire. we fear that v4 mapped
address to be used as a tool to hose security filters (like bypassing
"local host only" filter by using ::ffff:127.0.0.1)."

OpenBSD did the same a few months ago. FreeBSD has never had these checks.
 1.161 01-Feb-2018  maxv Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.160 30-Jan-2018  maxv Fix a buffer overflow in ip6_get_prevhdr. Doing

mtod(m, char *) + len

is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.

The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.

But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.

However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.

As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.

Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.

Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.

This place is still fragile.
 1.159 23-Jan-2018  maxv Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.
 1.158 05-Nov-2017  ozaki-r Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659
 1.157 01-Jun-2017  chs branches: 1.157.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.156 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.155 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.154 13-Dec-2016  ozaki-r branches: 1.154.2;
Remove unnecessary inclusions of nd6.h
 1.153 18-Nov-2016  knakahara fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.152 31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.151 29-Sep-2016  roy Now that we disallow sending or receiving from invalid addresses,
allow binding to tentative addresses.
 1.150 26-Aug-2016  roy Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.149 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.148 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.147 15-Jul-2016  ozaki-r Use ifatoia6 macro

No functional change.
 1.146 21-Jun-2016  ozaki-r branches: 1.146.2;
Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.145 16-Jun-2016  ozaki-r Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND
 1.144 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.143 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.142 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.141 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.140 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.139 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.138 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.137 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.136 09-Aug-2014  rtr branches: 1.136.2; 1.136.4; 1.136.6; 1.136.8;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.135 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.134 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.133 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.132 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.131 31-Jul-2014  ozaki-r Define IFNET_EMPTY() and replace !IFNET_FIRST() with it

No functional change.
 1.130 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.129 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.128 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.127 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.126 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.125 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.124 07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.123 07-Jul-2014  rtr * have pr_stat return EOPNOTSUPP consistently for all protocols that do
not fill in struct stat instead of returning success.

* in pr_stat remove all checks for non-NULL so->so_pcb except where the
pcb is actually used (i.e. cases where we don't return EOPNOTSUPP).

proposed on tech-net@
 1.122 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.121 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.120 23-Jun-2014  rtr where appropriate rename xxx_ioctl() struct mbuf * parameters from
`control' to `ifp' after split from xxx_usrreq().

sys_socket.c
fix wrapping of arguments to be consistent with other function calls
in the file after replacing pr_usrreq() call with pr_ioctl() which
required one less argument.

link_proto.c
fix indentation of parameters in link_ioctl() prototype to be
consistent with the rest of the file.

discussed with rmind@
 1.119 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.118 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.117 20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.116 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.115 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.114 18-May-2014  rmind Use IFNET_FIRST() rather than open coding ifnet access.
 1.113 25-Feb-2014  pooka branches: 1.113.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.112 23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.111 05-Jun-2013  christos branches: 1.111.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.110 22-Mar-2012  drochner branches: 1.110.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.109 19-Dec-2011  drochner branches: 1.109.2; 1.109.6; 1.109.8;
rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.108 03-May-2011  dyoung branches: 1.108.4; 1.108.8;
Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.107 08-Jul-2010  dyoung branches: 1.107.2;
Remove unnecessary casts from struct route * to struct route *.
 1.106 08-Jul-2010  dyoung Sprinkle const to prevent rip6_output() from re-assigning all but one of
its arguments.
 1.105 16-Sep-2009  pooka branches: 1.105.2; 1.105.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.104 06-May-2009  elad Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.103 15-Mar-2009  cegger ansify function definitions
 1.102 03-Jan-2009  yamt branches: 1.102.2;
remove extra semicolons.
 1.101 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.100 06-Aug-2008  plunky branches: 1.100.2;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.99 04-May-2008  thorpej branches: 1.99.2; 1.99.6;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.98 24-Apr-2008  ad branches: 1.98.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.97 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.96 15-Apr-2008  thorpej branches: 1.96.2;
Explicitly include <sys/percpu.h>.
 1.95 15-Apr-2008  thorpej Make raw6 stats per-cpu.
 1.94 15-Apr-2008  thorpej Make ip6 and icmp6 stats per-cpu.
 1.93 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.92 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.91 27-Nov-2007  christos branches: 1.91.10; 1.91.14;
require that the options argument is the right size, not that it is greater
or equal to the requested size. Suggested by Matt Thomas.
 1.90 06-Nov-2007  dyoung Delete dead code that I accidentally introduced before. Thanks
Arnaud Lacombe for pointing out to me Coverity CID 4562.
 1.89 06-Nov-2007  dyoung Use sockaddr_in6_init().
 1.88 01-Nov-2007  dyoung branches: 1.88.2;
De-__P().
 1.87 19-Sep-2007  dyoung branches: 1.87.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.86 19-Jul-2007  dyoung branches: 1.86.4; 1.86.6; 1.86.8;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.85 23-May-2007  christos branches: 1.85.2;
Ansify + add a few comments, from Karl Sjödahl
 1.84 04-Mar-2007  christos branches: 1.84.2; 1.84.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.83 22-Feb-2007  dyoung Cosmetic: remove extraneous () on return statements, break a line
in two, join lines, compare pointers with NULL instead of testing
their "truth."
 1.82 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.81 10-Feb-2007  degroote branches: 1.81.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.80 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.79 02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.78 23-Jul-2006  ad branches: 1.78.4; 1.78.6; 1.78.8; 1.78.10;
Use the LWP cached credentials where sane.
 1.77 14-May-2006  elad integrate kauth.
 1.76 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.75 21-Jan-2006  rpaulo branches: 1.75.2; 1.75.4; 1.75.6; 1.75.8; 1.75.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.74 11-Dec-2005  christos branches: 1.74.2;
merge ktrace-lwp.
 1.73 28-Aug-2005  rpaulo Implement net.inet6.raw6.stats sysctl.

Reviewed by Elad Efrat.
 1.72 29-May-2005  christos branches: 1.72.2;
- avoid shadowed variables
- sprinkle const.
 1.71 11-Mar-2005  atatat Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.
 1.70 10-Mar-2005  atatat Change types of kern.file2 and net.*.*.pcblist to NODE
 1.69 09-Mar-2005  atatat Add the following nodes to the sysctl tree:

net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
 1.68 06-Sep-2004  yamt branches: 1.68.4; 1.68.6;
rip6_output: redo raw_ip6.c 1.67-1.67, using m_copyback_cow.
 1.67 23-Jul-2004  yamt rip6_output: redo the previous (raw_ip6.c 1.66)
with less assumptions about alignment.
 1.66 22-Jul-2004  yamt rip6_output: make sure that the mbuf is writable
before write a checksum into it.
otherwise "ping6 -s50000" causes a panic.

ok'ed by itojun.
 1.65 11-Jun-2004  itojun implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.64 22-Apr-2004  itojun correct parameter to in6_cksum. keiichi@kame
 1.63 30-Oct-2003  simonb branches: 1.63.2;
Remove some assigned-to but otherwise unused variables.
 1.62 25-Oct-2003  christos fix uninitialized variables
 1.61 06-Sep-2003  itojun clarify flowlabel handling
 1.60 05-Sep-2003  itojun u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.59 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.58 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.57 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.56 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.55 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.54 29-Jun-2003  fvdl branches: 1.54.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.53 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.52 27-May-2003  itojun can't use M_WAIT here, i believe.
 1.51 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.50 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.49 20-Jul-2002  itojun remove unneeded extern decl (commented out). sync w/kame
 1.48 09-Jun-2002  itojun whitespace cleanup
 1.47 07-Jun-2002  itojun some KNF
 1.46 07-Jun-2002  itojun some KNF
 1.45 07-Jun-2002  itojun no need for offsetof()
 1.44 07-Jun-2002  itojun typo
 1.43 07-Jun-2002  itojun sync IPV6_CHECKSUM handling with kame.
 1.42 19-Mar-2002  itojun branches: 1.42.4; 1.42.6;
check sa_len and sa_family strictly. (NOTE: rtsol/rtsold older than Nov2001
will stop working, upgrade them first)
 1.41 20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.40 18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.39 13-Nov-2001  lukem add RCSIDs
 1.38 24-Oct-2001  itojun more whitespace sync with kame
 1.37 18-Oct-2001  itojun branches: 1.37.2;
gather stats on raw ip6 socket. sync with kame
 1.36 18-Oct-2001  itojun reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.35 25-Jul-2001  itojun allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.34 23-Jul-2001  itojun repair scoped address handling in PRU_BIND. sync with kame.
 1.33 03-Jul-2001  itojun branches: 1.33.2;
call in{,6}_pcbpurgeif0() before in{,6}_purgeif().
 1.32 08-May-2001  itojun correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.31 04-Mar-2001  itojun branches: 1.31.2;
avoid possible alignment issue. sync with kame
 1.30 26-Feb-2001  itojun make sure to validate packet against ipsec policy.
 1.29 11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.28 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.27 08-Feb-2001  itojun sync with kame better. cosmetic/stat changes only.
 1.26 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.25 19-Oct-2000  itojun memcpy -> bcopy, for sync with kame tree
 1.24 07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.23 29-May-2000  itojun branches: 1.23.2;
disallow bind(2) with IPv4 mapped address for now. port number check is
insufficient at this moment and we can bind(2) two sockets listen on same
port number.

for real fix, we need to check inpcb table with in6pcb. we can't
find inpcb chain from particular in6pcb chain (like finding tcbtable from tcb6)
luckily RFC2553 does not talk about bind(2) behavior for IPv4 mapped.
IPv4 mapped brings in too much complexities...
 1.22 01-Mar-2000  itojun branches: 1.22.2;
introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.21 28-Feb-2000  itojun make ICMPv6 redirect actually flush route cache in udp6/raw6 socket.
 1.20 26-Feb-2000  itojun implement rip6_ctlinput, to cope with routing changes correctly.
(IMHO we need rip_ctlinput as well)
 1.19 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.18 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.17 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.16 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.15 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.14 22-Dec-1999  itojun drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)
 1.13 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.12 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.11 13-Sep-1999  itojun branches: 1.11.2; 1.11.8;
- Call in{,6}_pcbdetach if ipsec initialization is failed during PRU_ATTACH.
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.

(sync with recent KAME)
 1.10 05-Aug-1999  itojun import recent kAME fixes.
- initialize hoplimit for raw6 socket properly.
- respect SO_TIMESTAMP on udp6.
- more sanity checks.
 1.9 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.8 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.7 19-Jul-1999  itojun fix IPV6_CHECKSUM socket option (length computation was wrong).
 1.6 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.5 06-Jul-1999  itojun checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file raw_ip6.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file raw_ip6.c was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.11.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.11.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.22.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23.2.3 09-May-2001  he Pull up revision 1.32 (via patch, requested by itojun):
Correct faith prefix determintaion.
 1.23.2.2 06-Apr-2001  he Pull up revision 1.26 (via patch, requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.23.2.1 26-Feb-2001  he Pull up revision 1.30 (requested by itojun):
Make sure to validate packet against ipsec policy.
 1.31.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.31.2.12 01-Aug-2002  nathanw Catch up to -current.
 1.31.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.31.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.31.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.31.2.8 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.31.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.31.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.31.2.5 22-Oct-2001  nathanw Catch up to -current.
 1.31.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.31.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.31.2.2 13-Mar-2001  nathanw Be more careful not to dereference curproc when there might not be
a process context.
 1.31.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.33.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.33.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.33.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.33.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.33.2.1 03-Aug-2001  lukem update to -current
 1.37.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.42.6.1 14-Jun-2004  jmc Pullup rev 1.65 (requested by itojun in ticket #1709)

Implement IPV6_USE_MIN_MTU sockopt.
 1.42.4.2 29-Aug-2002  gehenna catch up with -current.
 1.42.4.1 20-Jun-2002  gehenna catch up with -current.
 1.54.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.54.2.5 01-Apr-2005  skrll Sync with HEAD.
 1.54.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.54.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.54.2.2 03-Aug-2004  skrll Sync with HEAD
 1.54.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.63.2.2 11-Sep-2004  he Pull up revisions 1.66-1.68 (requested by yamt in ticket #836):
Ensure that the mbuf is writable before writing a checksum
into it.
 1.63.2.1 14-Jun-2004  tron Pull up revision 1.65 (requested by itojun in ticket #468):
implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.68.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.68.4.1 29-Apr-2005  kent sync with -current
 1.72.2.7 07-Dec-2007  yamt sync with head
 1.72.2.6 15-Nov-2007  yamt sync with head.
 1.72.2.5 27-Oct-2007  yamt sync with head.
 1.72.2.4 03-Sep-2007  yamt sync with head.
 1.72.2.3 26-Feb-2007  yamt sync with head.
 1.72.2.2 30-Dec-2006  yamt sync with head.
 1.72.2.1 21-Jun-2006  yamt sync with head.
 1.74.2.1 01-Feb-2006  yamt sync with head.
 1.75.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.75.8.3 11-May-2006  elad sync with head
 1.75.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.75.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.75.6.2 11-Aug-2006  yamt sync with head
 1.75.6.1 24-May-2006  yamt sync with head.
 1.75.4.1 01-Jun-2006  kardel Sync with head.
 1.75.2.4 09-Sep-2006  rpaulo sync with head
 1.75.2.3 23-Feb-2006  rpaulo Remove in6pcb references from rip6_output() and rip6_usrreq().
 1.75.2.2 14-Feb-2006  rpaulo Replace in6pcb with inpcb.
 1.75.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.78.10.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.78.8.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.78.6.1 10-Dec-2006  yamt sync with head.
 1.78.4.1 12-Jan-2007  ad Sync with head.
 1.81.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.81.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.84.4.1 11-Jul-2007  mjf Sync with head.
 1.84.2.3 09-Oct-2007  ad Sync with head.
 1.84.2.2 20-Aug-2007  ad Sync with HEAD.
 1.84.2.1 08-Jun-2007  ad Sync with head.
 1.85.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.86.8.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.86.8.1 19-Jul-2007  dyoung file raw_ip6.c was added on branch matt-mips64 on 2007-07-19 20:48:59 +0000
 1.86.6.3 09-Jan-2008  matt sync with HEAD
 1.86.6.2 08-Nov-2007  matt sync with -HEAD
 1.86.6.1 06-Nov-2007  matt sync with HEAD
 1.86.4.4 03-Dec-2007  joerg Sync with HEAD.
 1.86.4.3 11-Nov-2007  joerg Sync with HEAD.
 1.86.4.2 04-Nov-2007  jmcneill Sync with HEAD.
 1.86.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.87.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.88.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.88.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.91.14.3 17-Jan-2009  mjf Sync with HEAD.
 1.91.14.2 28-Sep-2008  mjf Sync with HEAD.
 1.91.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.91.10.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.96.2.1 18-May-2008  yamt sync with head.
 1.98.2.5 11-Aug-2010  yamt sync with head.
 1.98.2.4 11-Mar-2010  yamt sync with head
 1.98.2.3 16-May-2009  yamt sync with head
 1.98.2.2 04-May-2009  yamt sync with head.
 1.98.2.1 16-May-2008  yamt sync with head.
 1.99.6.1 19-Oct-2008  haad Sync with HEAD.
 1.99.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.100.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.100.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.102.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.105.4.2 31-May-2011  rmind sync with head
 1.105.4.1 05-Mar-2011  rmind sync with head
 1.105.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.107.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.108.8.2 05-Apr-2012  mrg sync to latest -current.
 1.108.8.1 18-Feb-2012  mrg merge to -current.
 1.108.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.108.4.1 17-Apr-2012  yamt sync with head
 1.109.8.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1541):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.109.8.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.109.6.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1541):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.109.6.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.109.2.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1541):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.109.2.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.110.2.3 03-Dec-2017  jdolecek update from HEAD
 1.110.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.110.2.1 23-Jun-2013  tls resync from head
 1.111.2.3 18-May-2014  rmind sync with head
 1.111.2.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.111.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.113.2.1 10-Aug-2014  tls Rebase.
 1.136.8.1 18-Jan-2017  skrll Sync with netbsd-5
 1.136.6.3 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.136.6.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1591):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.136.6.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.136.4.9 28-Aug-2017  skrll Sync with HEAD
 1.136.4.8 05-Feb-2017  skrll Sync with HEAD
 1.136.4.7 05-Dec-2016  skrll Sync with HEAD
 1.136.4.6 05-Oct-2016  skrll Sync with HEAD
 1.136.4.5 09-Jul-2016  skrll Sync with HEAD
 1.136.4.4 29-May-2016  skrll Sync with HEAD
 1.136.4.3 22-Sep-2015  skrll Sync with HEAD
 1.136.4.2 06-Jun-2015  skrll Sync with HEAD
 1.136.4.1 06-Apr-2015  skrll Sync with HEAD
 1.136.2.4 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.136.2.3 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1591):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.136.2.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.136.2.1 28-Sep-2016  bouyer branches: 1.136.2.1.2;
Pull up following revision(s) (requested by roy in ticket #1243):
sys/netinet6/raw_ip6.c: revision 1.150 via patch
sys/netinet6/in6_pcb.c: revision 1.149 via patch
Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.136.2.1.2.3 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.136.2.1.2.2 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1591):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.136.2.1.2.1 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.146.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.146.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.146.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.146.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.146.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.154.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.157.2.6 23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1808):

sys/netinet6/raw_ip6.c: revision 1.183 (via patch)
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.157.2.5 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1175):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/sctp_usrreq.c 1.14
sys/netinet/tcp_usrreq.c 1.223
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/sctp6_usrreq.c 1.17
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.157.2.4 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.157.2.3 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #666):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.157.2.2 30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #527):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.157.2.1 08-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #350):
sys/netinet6/icmp6.c: revision 1.214
sys/netinet6/raw_ip6.c: revision 1.158
Fix usages of ipsec_used
If IPsec isn't used, we must go back to the normal path.
PR kern/52659
 1.165.2.4 21-May-2018  pgoyette Sync with HEAD
 1.165.2.3 02-May-2018  pgoyette Synch with HEAD
 1.165.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.165.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.172.2.1 10-Jun-2019  christos Sync with HEAD
 1.175.4.2 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1809):

sys/netinet6/raw_ip6.c: revision 1.184 (patch)
sys/netinet6/icmp6.c: revision 1.256 (patch)

Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.175.4.1 23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1615):

sys/netinet6/raw_ip6.c: revision 1.183 (via patch)
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.182.2.2 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #615):

sys/netinet6/raw_ip6.c: revision 1.184
sys/netinet6/icmp6.c: revision 1.256

Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.182.2.1 23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #125):

sys/netinet6/raw_ip6.c: revision 1.183
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.184.2.1 02-Aug-2025  perseant Sync with HEAD
 1.5 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.4 15-Apr-2008  thorpej branches: 1.4.90; 1.4.92;
Make raw6 stats per-cpu.
 1.3 10-Dec-2005  elad branches: 1.3.70;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 28-Aug-2005  rpaulo Implement net.inet6.raw6.stats sysctl.

Reviewed by Elad Efrat.
 1.1 18-Oct-2001  itojun branches: 1.1.2; 1.1.6; 1.1.22; 1.1.38;
gather stats on raw ip6 socket. sync with kame
 1.1.38.1 21-Jun-2006  yamt sync with head.
 1.1.22.2 11-Dec-2005  christos Sync with head.
 1.1.22.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.6.1 18-Oct-2001  thorpej file raw_ip6.h was added on branch kqueue on 2002-01-10 20:03:31 +0000
 1.1.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.1.2.1 18-Oct-2001  nathanw file raw_ip6.h was added on branch nathanw_sa on 2001-10-22 20:42:06 +0000
 1.3.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.4.92.1 10-Jun-2019  christos Sync with HEAD
 1.4.90.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.25 01-Feb-2018  maxv Remove this code, RH0 must be dropped, according to RFC5095. FreeBSD and
OpenBSD already do the same. Also, style, and remove useless includes.
 1.24 01-Feb-2018  maxv Fix the ICMP error code. rh was obtained via IP6_EXTHDR_GET, and it is not
guaranteed to be in the same mbuf as ip6, so computing the difference
between the pointers may result in a wrong offset.

ip6 is now unused, so remove it.
 1.23 15-Apr-2008  thorpej branches: 1.23.84;
Make ip6 and icmp6 stats per-cpu.
 1.22 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.21 29-Oct-2007  dyoung branches: 1.21.12; 1.21.16;
The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.
 1.20 23-May-2007  christos branches: 1.20.6; 1.20.8; 1.20.12;
Ansify + add a few comments, from Karl Sjödahl
 1.19 17-May-2007  yamt remove net.inet6.ip6.rht0 sysctl.
it's too dangerous compared to its benefit.

strongly requested by itojun@. ok'ed by core@.
 1.18 22-Apr-2007  christos Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).

Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.17 04-Mar-2007  christos branches: 1.17.2; 1.17.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.16 16-Nov-2006  christos branches: 1.16.2; 1.16.4;
__unused removal on arguments; approved by core.
 1.15 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.14 21-Jan-2006  rpaulo branches: 1.14.18; 1.14.20;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.13 06-Jun-2003  itojun branches: 1.13.4; 1.13.8; 1.13.16; 1.13.18; 1.13.24; 1.13.28; 1.13.30; 1.13.32;
- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).
 1.12 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.11 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.10 13-Nov-2001  lukem add RCSIDs
 1.9 16-Oct-2001  itojun more whitespace/comment sync with kame
 1.8 10-Feb-2001  itojun branches: 1.8.2; 1.8.4;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.7 20-Sep-2000  itojun repair cut-and-paste bug. from: francis dupont. sync with kame
 1.6 06-Feb-2000  itojun branches: 1.6.4;
fix include pathname for better rfc2292 compliance.
 1.5 31-Jan-2000  itojun be proactive about malicious packet on the wire. we fear that v4 mapped
address to be used as a tool to hose security filters (like bypassing
"local host only" filter by using ::ffff:127.0.0.1).
 1.4 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.3 03-Jul-1999  thorpej branches: 1.3.2; 1.3.8;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file route6.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file route6.c was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.3.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.4.1 28-Sep-2000  itojun pullup 1.6 -> 1.7 (approved by releng-1-5)
>repair cut-and-paste bug. from: francis dupont. sync with kame
 1.8.4.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.8.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.2.3 17-Sep-2002  nathanw Catch up to -current.
 1.8.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.8.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.13.32.1 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.13.30.1 01-Feb-2006  yamt sync with head.
 1.13.28.1 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.13.24.1 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.13.18.4 15-Nov-2007  yamt sync with head.
 1.13.18.3 03-Sep-2007  yamt sync with head.
 1.13.18.2 30-Dec-2006  yamt sync with head.
 1.13.18.1 21-Jun-2006  yamt sync with head.
 1.13.16.1 26-Apr-2007  ghen Pull up following revision(s) (requested by christos in ticket #1766):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revision 1.41 via patch
sys/netinet6/ip6_var.h: revision 1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.13.8.1 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.13.4.1 04-Jun-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11330):
sys/netinet6/ip6_input.c: revision 1.102 via patch
sys/netinet6/route6.c: revision 1.18 via patch
sys/netinet6/ip6_var.h: revisions 1.41-1.42 via patch
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
 1.14.20.2 10-Dec-2006  yamt sync with head.
 1.14.20.1 22-Oct-2006  yamt sync with head
 1.14.18.1 18-Nov-2006  ad Sync with head.
 1.16.4.3 17-May-2007  yamt sync with head.
 1.16.4.2 07-May-2007  yamt sync with head.
 1.16.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.16.2.1 28-Apr-2007  bouyer Pull up following revision(s) (requested by christos in ticket #587):
sys/netinet6/ip6_input.c: revision 1.102
sys/netinet6/route6.c: revision 1.18
sys/netinet6/ip6_var.h: revision 1.41
sys/netinet6/ip6_var.h: revision 1.42
sbin/sysctl/sysctl.8: patch
Disable processing of routing header type 0 packets since they can be used
of DoS attacks. Provide a sysctl to re-enable them (net.inet6.ip6.rht0).
Information from:
http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf
fix typo.
 1.17.4.1 11-Jul-2007  mjf Sync with head.
 1.17.2.1 08-Jun-2007  ad Sync with head.
 1.20.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.20.8.1 06-Nov-2007  matt sync with HEAD
 1.20.6.1 31-Oct-2007  joerg Sync with HEAD.
 1.21.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.21.12.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.23.84.1 26-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #569):
sys/netinet6/route6.c: 1.24-1.25
Fix the ICMP error code. rh was obtained via IP6_EXTHDR_GET, and it is not
guaranteed to be in the same mbuf as ip6, so computing the difference
between the pointers may result in a wrong offset.
ip6 is now unused, so remove it.
--
Remove this code, RH0 must be dropped, according to RFC5095. FreeBSD and
OpenBSD already do the same. Also, style, and remove useless includes.
 1.23 16-Jun-2020  maxv remove unused
 1.22 23-Sep-2019  kamil Remove __noubsan from in6_clearscope()

The alignment issues for x86 should be handled by
- src/sys/arch/amd64/include/types.h r. 1.62 and
- src/sys/arch/i386/include/types.h r. 1.90
 1.21 20-Sep-2019  kamil Decorate in6_clearscope() with __noubsan

sys/netinet6/scope6.c:480:6,
member access within misaligned address 0xffff9457bc441286 for type
'struct in6_addr' which requires 4 byte alignment

This issue is caused by accessing non-__packed struct in __packed.
This is a[always?] false-positive reported by the sanitizer and there is no
clear non-invasive approach to handle this, without changing ABI of long
term existing code.

Reported-by: syzbot+b53a9bcf030288081e65@syzkaller.appspotmail.com
 1.20 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.19 01-Feb-2018  maxv branches: 1.19.2;
Style, no real functional change.
 1.18 17-Sep-2017  christos explain why in6_setscope fails...
 1.17 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.16 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.15 12-Aug-2016  christos branches: 1.15.2;
In rump (ifp)->if_afdata[AF_INET6] == NULL if we did not register netinet6
yet. Treat this like we don't have a scope, and make the sid tests consistent.
 1.14 15-Jun-2016  ozaki-r branches: 1.14.2;
Protect if_byindex by pserialize
 1.13 19-May-2016  ozaki-r Replace DIAGNOSTIC & panic with KASSERT
 1.12 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.11 10-Dec-2014  christos printable version of the scope.
remove stray breaks.
 1.10 16-Nov-2014  joerg branches: 1.10.2;
Drop impossible check.
 1.9 17-May-2014  rmind branches: 1.9.2;
Replace open-coded access (and boundary checking) of ifindex2ifnet with
if_byindex() function.
 1.8 11-Sep-2009  dyoung branches: 1.8.22; 1.8.26; 1.8.36;
Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.7 15-Mar-2009  cegger ansify function definitions
 1.6 11-Dec-2007  lukem branches: 1.6.12; 1.6.20; 1.6.26;
use __KERNEL_RCSID()
 1.5 24-Oct-2007  dyoung branches: 1.5.4; 1.5.6; 1.5.8;
Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.4 03-Sep-2006  christos branches: 1.4.26; 1.4.28; 1.4.32;
comment out impossible comparison.
 1.3 01-Sep-2006  dyoung Re-use macro IN6_IS_SCOPE_EMBEDDABLE().
 1.2 05-Mar-2006  rpaulo branches: 1.2.2; 1.2.12;
bzero -> memset
 1.1 21-Jan-2006  rpaulo branches: 1.1.2; 1.1.4; 1.1.6;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.1.6.1 22-Apr-2006  simonb Sync with head.
 1.1.4.2 01-Feb-2006  yamt sync with head.
 1.1.4.1 21-Jan-2006  yamt file scope6.c was added on branch yamt-uio_vmspace on 2006-02-01 14:52:42 +0000
 1.1.2.1 09-Sep-2006  rpaulo sync with head
 1.2.12.5 21-Jan-2008  yamt sync with head
 1.2.12.4 27-Oct-2007  yamt sync with head.
 1.2.12.3 30-Dec-2006  yamt sync with head.
 1.2.12.2 21-Jun-2006  yamt sync with head.
 1.2.12.1 05-Mar-2006  yamt file scope6.c was added on branch yamt-lazymbuf on 2006-06-21 15:11:09 +0000
 1.2.2.2 03-Sep-2006  yamt sync with head.
 1.2.2.1 05-Mar-2006  yamt file scope6.c was added on branch yamt-pdpolicy on 2006-09-03 15:25:42 +0000
 1.4.32.1 13-Nov-2007  bouyer Sync with HEAD
 1.4.28.2 09-Jan-2008  matt sync with HEAD
 1.4.28.1 06-Nov-2007  matt sync with HEAD
 1.4.26.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.5.8.1 13-Dec-2007  bouyer Sync with HEAD
 1.5.6.1 11-Dec-2007  yamt sync with head.
 1.5.4.1 26-Dec-2007  ad Sync with head.
 1.6.26.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.6.20.1 28-Apr-2009  skrll Sync with HEAD.
 1.6.12.2 16-Sep-2009  yamt sync with head
 1.6.12.1 04-May-2009  yamt sync with head.
 1.8.36.1 10-Aug-2014  tls Rebase.
 1.8.26.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.8.22.2 03-Dec-2017  jdolecek update from HEAD
 1.8.22.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.2.1 15-May-2015  snj Pull up following revision(s) (requested by joerg in ticket #770):
sys/netinet6/scope6.c: revision 1.10
Drop impossible check.
 1.10.2.5 05-Feb-2017  skrll Sync with HEAD
 1.10.2.4 05-Oct-2016  skrll Sync with HEAD
 1.10.2.3 09-Jul-2016  skrll Sync with HEAD
 1.10.2.2 29-May-2016  skrll Sync with HEAD
 1.10.2.1 06-Apr-2015  skrll Sync with HEAD
 1.14.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.15.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.2.1 02-May-2018  pgoyette Synch with HEAD
 1.6 11-May-2024  andvar s/embbeded/embedded/.
 1.5 16-Jun-2020  maxv remove unused
 1.4 09-Aug-2017  christos PR/52472: Edgar Fuss: Document handling of scoped IPv6 addresses by embedding
ASCII art from:
IPv6 Core Protocols Implementation
By Qing Li, Tatuya Jinmei, Keiichi Shima
Page 56, Figure 2.12
 1.3 10-Dec-2014  christos printable version of the scope.
remove stray breaks.
 1.2 24-Oct-2007  dyoung branches: 1.2.64; 1.2.84;
Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.1 21-Jan-2006  rpaulo branches: 1.1.4; 1.1.18; 1.1.46; 1.1.48; 1.1.52;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.1.52.1 13-Nov-2007  bouyer Sync with HEAD
 1.1.48.1 06-Nov-2007  matt sync with HEAD
 1.1.46.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.1.18.3 27-Oct-2007  yamt sync with head.
 1.1.18.2 21-Jun-2006  yamt sync with head.
 1.1.18.1 21-Jan-2006  yamt file scope6_var.h was added on branch yamt-lazymbuf on 2006-06-21 15:11:09 +0000
 1.1.4.2 01-Feb-2006  yamt sync with head.
 1.1.4.1 21-Jan-2006  yamt file scope6_var.h was added on branch yamt-uio_vmspace on 2006-02-01 14:52:42 +0000
 1.2.84.2 28-Aug-2017  skrll Sync with HEAD
 1.2.84.1 06-Apr-2015  skrll Sync with HEAD
 1.2.64.1 03-Dec-2017  jdolecek update from HEAD
 1.26 06-Jul-2024  andvar Fix various typos in comments:
s/defininitions/definitions/
s/ininitialise/initialise/
s/collasped/collapsed/
s/optionaly/optionally/
 1.25 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.24 04-Nov-2022  ozaki-r branches: 1.24.8;
inpcb: rename functions to in6pcb_*
 1.23 28-Oct-2022  ozaki-r Adjust dccp and sctp for struct inpcb separation
 1.22 27-Apr-2020  rjs Call IPv4 handler for accept().
 1.21 27-Apr-2020  rjs Do sctp_connectx() handling using ioctl() for IPv6 as well.
 1.20 25-Jun-2019  rjs Split out the prototypes for add/delete address into a separate header file.
 1.19 25-Feb-2019  maxv RIP6, CAN, SCTP and SCTP6 lack a length check in their _send() functions.
Fix RIP6 and CAN, add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+0b9692ae0f49f93b7dc7@syzkaller.appspotmail.com
 1.18 24-Feb-2019  maxv RIP, RIP6, DDP, SCTP and SCTP6 lack a length check in their _connect()
functions. Fix the first three, and add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+9eaf98dad6ca738c250d@syzkaller.appspotmail.com
 1.17 28-Jan-2019  martin Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.16 01-May-2018  maxv branches: 1.16.2;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.15 26-Feb-2018  maxv branches: 1.15.2;
Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.14 17-Oct-2017  rjs branches: 1.14.2;
Make SCTP work when IPSEC is also defined.
 1.13 20-Apr-2017  ozaki-r branches: 1.13.4;
Fix build of kernel with SCTP
 1.12 20-Apr-2017  ozaki-r Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.11 13-Dec-2016  ozaki-r branches: 1.11.2;
Remove unnecessary inclusions of nd6.h
 1.10 06-Dec-2016  knakahara remove unnecessary extern declaration.

inetsw has been declared since r1.1, however sctp6_usrreq.c can be built
without the declaration. It must be removed.
 1.9 18-Nov-2016  knakahara fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.8 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.7 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.6 07-Jul-2016  ozaki-r branches: 1.6.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.5 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.4 25-Apr-2016  rjs Fix build when IPSEC enabled.
 1.3 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.2 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.8 28-Aug-2017  skrll Sync with HEAD
 1.1.2.7 05-Feb-2017  skrll Sync with HEAD
 1.1.2.6 05-Dec-2016  skrll Sync with HEAD
 1.1.2.5 05-Oct-2016  skrll Sync with HEAD
 1.1.2.4 09-Jul-2016  skrll Sync with HEAD
 1.1.2.3 29-May-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp6_usrreq.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.6.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.6.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.6.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.6.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.11.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.13.4.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1175):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/sctp_usrreq.c 1.14
sys/netinet/tcp_usrreq.c 1.223
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/sctp6_usrreq.c 1.17
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.14.2.2 03-Dec-2017  jdolecek update from HEAD
 1.14.2.1 17-Oct-2017  jdolecek file sctp6_usrreq.c was added on branch tls-maxphys on 2017-12-03 11:39:05 +0000
 1.15.2.1 02-May-2018  pgoyette Synch with HEAD
 1.16.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.16.2.1 10-Jun-2019  christos Sync with HEAD
 1.24.8.1 02-Aug-2025  perseant Sync with HEAD
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp6_var.h was added on branch tls-maxphys on 2017-12-03 11:39:05 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp6_var.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.7 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.6 18-Dec-2001  itojun branches: 1.6.16; 1.6.32;
reduce white space/cosmetic diffs w/kame.
 1.5 10-Feb-2001  itojun branches: 1.5.2; 1.5.4;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.4 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.3 03-Jul-1999  thorpej branches: 1.3.2;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file udp6.h was initially added on branch kame.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file udp6.h was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.3.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.5.2.1 08-Jan-2002  nathanw Catch up to -current.
 1.6.32.1 21-Jun-2006  yamt sync with head.
 1.6.16.1 11-Dec-2005  christos Sync with head.
 1.57 08-Feb-2018  maxv Move udp6_output() into udp6_usrreq.c, and remove udp6_output.c. This is
more consistent with IPv4, and there is no good reason for keeping a
separate file only for one function. FreeBSD did the same.
 1.56 08-Feb-2018  maxv Style, no functional change.
 1.55 03-Mar-2017  ozaki-r branches: 1.55.6;
Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.54 31-Oct-2016  ozaki-r branches: 1.54.2;
Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.53 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.52 21-Jun-2016  ozaki-r branches: 1.52.2;
Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.51 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.50 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.49 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.48 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.47 05-Dec-2014  seanb - Fix comment which was no longer accurate after previous change to move
from in_pcbconnect -> in6_pcbsetport.
 1.46 14-Nov-2014  maxv branches: 1.46.2;
Do not uselessly include <sys/malloc.h>.
 1.45 11-Oct-2014  christos Make IPV4 mapped addresses able to do IPV4 multicast. Fixes needed:

- allow binding to mapped v4 multicast addresses
- define v4moptions, allow setting it via ioctl, pass it to ip_output,
free it when killing the pcb.

Ideally we would allow the IPV6 multicast setsockopts work on mapped addresses
too, but this is a lot more work and linux does not do it either.
 1.44 06-Jan-2013  christos branches: 1.44.2; 1.44.12;
PR/47408: Anthony Mallet: sendto(2) issue with IPv6 UDP datagrams
- don't connect when the local port is 0, just set the local port number.
- remove redundant assignment
XXX: pullup-6
 1.43 24-Sep-2011  christos branches: 1.43.2; 1.43.8; 1.43.12; 1.43.14;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.42 31-Aug-2011  plunky NULL does not need a cast
 1.41 15-Jul-2010  dyoung Under some circumstances, udp6_output() would call ip6_clearpktopts()
with an uninitialized struct ip6_pktopts on the stack, opt.
ip6_clearpktopts(&opt, ...) could dereference dangling pointers,
leading to memory corruption or a crash. Now, udp6_output() calls
ip6_clearpktopts(&opt, ...) only if opt was initialized. Thanks to
Clement LECIGNE for reporting this bug.

Fix a potential memory leak: it is udp6_output()'s responsibility
to free its mbuf arguments on error. In the unlikely event that
sa6_embedscope() failed, udp6_output() would not free its mbuf
arguments.

I will ask for this to be pulled up to -4, -5, and -5-0.
 1.40 08-Jul-2010  dyoung Sprinkle 'const' to prevent udp6_output() from reassigning all but one
of its arguments.
 1.39 06-May-2009  elad branches: 1.39.2; 1.39.4;
Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.38 30-Apr-2009  elad - Make in6_pcbbind_{addr,port}() static

- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()

- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
so that we have a more complete context

- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()

- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
not a copy of it, pointed out by bouyer@, thanks!

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
 1.37 24-Oct-2008  dyoung branches: 1.37.4; 1.37.8; 1.37.10; 1.37.12;
Use sockaddr_in_init(). Wrap lines. No functional change intended.
 1.36 13-May-2008  dyoung branches: 1.36.4;
Change bzero() to memset(), non-overlapping bcopy() to memcpy().
Remove unnecessary casts to struct route *.
 1.35 15-Apr-2008  thorpej branches: 1.35.2; 1.35.4; 1.35.6;
Make udp6 stats per-cpu.
 1.34 12-Apr-2008  thorpej Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.33 06-Apr-2008  xtraeme Make this build again after thorpej's changes to udpstat.
 1.32 14-Jan-2008  dyoung branches: 1.32.6;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in6_losing().
 1.31 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.30 23-May-2007  christos branches: 1.30.8; 1.30.14; 1.30.16; 1.30.20;
Ansify + add a few comments, from Karl Sjödahl
 1.29 04-Mar-2007  christos branches: 1.29.2; 1.29.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.28 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.27 04-Jan-2007  elad branches: 1.27.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.26 23-Jul-2006  ad branches: 1.26.4; 1.26.8; 1.26.14;
Use the LWP cached credentials where sane.
 1.25 14-May-2006  elad integrate kauth.
 1.24 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.23 21-Jan-2006  rpaulo branches: 1.23.2; 1.23.4; 1.23.6; 1.23.8; 1.23.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.22 11-Dec-2005  christos branches: 1.22.2;
merge ktrace-lwp.
 1.21 10-Aug-2005  yamt ipv6 tx checksum offloading. reviewed by Jason Thorpe.
 1.20 22-Apr-2005  yamt branches: 1.20.2;
disable loopback checksum omission for udp6.

i forgot to commit this with:
http://mail-index.NetBSD.org/source-changes/2005/04/18/0023.html
 1.19 15-Dec-2004  thorpej branches: 1.19.2; 1.19.8;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.18 11-Jun-2004  itojun implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.17 05-Sep-2003  itojun branches: 1.17.2;
u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.16 22-Aug-2003  itojun correct missing inclusion of opt_ipsec.h
 1.15 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.14 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.13 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.12 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.11 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.10 11-Sep-2002  itojun branches: 1.10.6;
KNF - return is not a function. sync w/kame.
 1.9 26-Aug-2002  itojun pass proc * to in6_pcbsetport. PR 18073
 1.8 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.7 08-Jun-2002  itojun whitespace cleanup
 1.6 18-Dec-2001  itojun branches: 1.6.8; 1.6.10;
reduce white space/cosmetic diffs w/kame.
 1.5 13-Nov-2001  lukem add RCSIDs
 1.4 24-Oct-2001  itojun more whitespace sync with kame
 1.3 18-Oct-2001  itojun branches: 1.3.2;
reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.2 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.1 08-Feb-2001  itojun branches: 1.1.2; 1.1.4; 1.1.6;
move udp6_output() to separate file. (sync better with kame)
 1.1.6.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.1.6.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.1.6.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.4.6 17-Sep-2002  nathanw Catch up to -current.
 1.1.4.5 27-Aug-2002  nathanw Catch up to -current.
 1.1.4.4 20-Jun-2002  nathanw Catch up to -current.
 1.1.4.3 08-Jan-2002  nathanw Catch up to -current.
 1.1.4.2 14-Nov-2001  nathanw Catch up to -current.
 1.1.4.1 22-Oct-2001  nathanw Catch up to -current.
 1.1.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.1.2.1 08-Feb-2001  bouyer file udp6_output.c was added on branch thorpej_scsipi on 2001-02-11 19:17:30 +0000
 1.3.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.6.10.2 14-Jun-2004  jmc Pullup rev 1.18 (requested by itojun in ticket #1709)

Implement IPV6_USE_MIN_MTU sockopt.
 1.6.10.1 27-Aug-2002  lukem Pull up revision 1.9 (requested by itojun in ticket #731):
pass proc * to in6_pcbsetport. PR 18073
 1.6.8.2 29-Aug-2002  gehenna catch up with -current.
 1.6.8.1 20-Jun-2002  gehenna catch up with -current.
 1.10.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.10.6.4 18-Dec-2004  skrll Sync with HEAD.
 1.10.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.10.6.1 03-Aug-2004  skrll Sync with HEAD
 1.17.2.1 14-Jun-2004  tron Pull up revision 1.18 (requested by itojun in ticket #468):
implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.19.8.1 06-May-2005  tron Pull up revision 1.20 (requested by yamt in ticket #251):
disable loopback checksum omission for udp6.
i forgot to commit this with:
http://mail-index.NetBSD.org/source-changes/2005/04/18/0023.html
 1.19.2.1 29-Apr-2005  kent sync with -current
 1.20.2.5 21-Jan-2008  yamt sync with head
 1.20.2.4 03-Sep-2007  yamt sync with head.
 1.20.2.3 26-Feb-2007  yamt sync with head.
 1.20.2.2 30-Dec-2006  yamt sync with head.
 1.20.2.1 21-Jun-2006  yamt sync with head.
 1.22.2.1 01-Feb-2006  yamt sync with head.
 1.23.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.23.8.3 11-May-2006  elad sync with head
 1.23.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.23.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.23.6.2 11-Aug-2006  yamt sync with head
 1.23.6.1 24-May-2006  yamt sync with head.
 1.23.4.1 01-Jun-2006  kardel Sync with head.
 1.23.2.3 09-Sep-2006  rpaulo sync with head
 1.23.2.2 23-Feb-2006  rpaulo Remove the remaining references to in6pcb in udp6_output().
 1.23.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.26.14.1 16-Jul-2010  riz Pull up following revision(s) (requested by dyoung in ticket #1397):
sys/netinet6/udp6_output.c: revision 1.41
Under some circumstances, udp6_output() would call ip6_clearpktopts()
with an uninitialized struct ip6_pktopts on the stack, opt.
ip6_clearpktopts(&opt, ...) could dereference dangling pointers,
leading to memory corruption or a crash. Now, udp6_output() calls
ip6_clearpktopts(&opt, ...) only if opt was initialized. Thanks to
Clement LECIGNE for reporting this bug.
Fix a potential memory leak: it is udp6_output()'s responsibility
to free its mbuf arguments on error. In the unlikely event that
sa6_embedscope() failed, udp6_output() would not free its mbuf
arguments.
I will ask for this to be pulled up to -4, -5, and -5-0.
 1.26.8.1 16-Jul-2010  riz Pull up following revision(s) (requested by dyoung in ticket #1397):
sys/netinet6/udp6_output.c: revision 1.41
Under some circumstances, udp6_output() would call ip6_clearpktopts()
with an uninitialized struct ip6_pktopts on the stack, opt.
ip6_clearpktopts(&opt, ...) could dereference dangling pointers,
leading to memory corruption or a crash. Now, udp6_output() calls
ip6_clearpktopts(&opt, ...) only if opt was initialized. Thanks to
Clement LECIGNE for reporting this bug.
Fix a potential memory leak: it is udp6_output()'s responsibility
to free its mbuf arguments on error. In the unlikely event that
sa6_embedscope() failed, udp6_output() would not free its mbuf
arguments.
I will ask for this to be pulled up to -4, -5, and -5-0.
 1.26.4.1 12-Jan-2007  ad Sync with head.
 1.27.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.27.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.29.4.1 11-Jul-2007  mjf Sync with head.
 1.29.2.1 08-Jun-2007  ad Sync with head.
 1.30.20.2 19-Jan-2008  bouyer Sync with HEAD
 1.30.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.30.16.1 26-Dec-2007  ad Sync with head.
 1.30.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.30.8.2 23-Mar-2008  matt sync with HEAD
 1.30.8.1 09-Jan-2008  matt sync with HEAD
 1.32.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.32.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.35.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.35.4.4 11-Aug-2010  yamt sync with head.
 1.35.4.3 16-May-2009  yamt sync with head
 1.35.4.2 04-May-2009  yamt sync with head.
 1.35.4.1 16-May-2008  yamt sync with head.
 1.35.2.1 18-May-2008  yamt sync with head.
 1.36.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.37.12.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.37.10.1 16-Jul-2010  riz Pull up following revision(s) (requested by dyoung in ticket #1428):
sys/netinet6/udp6_output.c: revision 1.41
Under some circumstances, udp6_output() would call ip6_clearpktopts()
with an uninitialized struct ip6_pktopts on the stack, opt.
ip6_clearpktopts(&opt, ...) could dereference dangling pointers,
leading to memory corruption or a crash. Now, udp6_output() calls
ip6_clearpktopts(&opt, ...) only if opt was initialized. Thanks to
Clement LECIGNE for reporting this bug.
Fix a potential memory leak: it is udp6_output()'s responsibility
to free its mbuf arguments on error. In the unlikely event that
sa6_embedscope() failed, udp6_output() would not free its mbuf
arguments.
I will ask for this to be pulled up to -4, -5, and -5-0.
 1.37.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.37.4.1 16-Jul-2010  riz Pull up following revision(s) (requested by dyoung in ticket #1428):
sys/netinet6/udp6_output.c: revision 1.41
Under some circumstances, udp6_output() would call ip6_clearpktopts()
with an uninitialized struct ip6_pktopts on the stack, opt.
ip6_clearpktopts(&opt, ...) could dereference dangling pointers,
leading to memory corruption or a crash. Now, udp6_output() calls
ip6_clearpktopts(&opt, ...) only if opt was initialized. Thanks to
Clement LECIGNE for reporting this bug.
Fix a potential memory leak: it is udp6_output()'s responsibility
to free its mbuf arguments on error. In the unlikely event that
sa6_embedscope() failed, udp6_output() would not free its mbuf
arguments.
I will ask for this to be pulled up to -4, -5, and -5-0.
 1.39.4.1 05-Mar-2011  rmind sync with head
 1.39.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.43.14.1 31-Mar-2013  riz Pull up following revision(s) (requested by christos in ticket #853):
sys/netinet6/udp6_output.c: revision 1.44
PR/47408: Anthony Mallet: sendto(2) issue with IPv6 UDP datagrams
- don't connect when the local port is 0, just set the local port number.
- remove redundant assignment
XXX: pullup-6
 1.43.12.2 03-Dec-2017  jdolecek update from HEAD
 1.43.12.1 25-Feb-2013  tls resync with head
 1.43.8.1 31-Mar-2013  riz Pull up following revision(s) (requested by christos in ticket #853):
sys/netinet6/udp6_output.c: revision 1.44
PR/47408: Anthony Mallet: sendto(2) issue with IPv6 UDP datagrams
- don't connect when the local port is 0, just set the local port number.
- remove redundant assignment
XXX: pullup-6
 1.43.2.1 23-Jan-2013  yamt sync with head
 1.44.12.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.44.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.46.2.7 28-Aug-2017  skrll Sync with HEAD
 1.46.2.6 05-Dec-2016  skrll Sync with HEAD
 1.46.2.5 05-Oct-2016  skrll Sync with HEAD
 1.46.2.4 09-Jul-2016  skrll Sync with HEAD
 1.46.2.3 22-Sep-2015  skrll Sync with HEAD
 1.46.2.2 06-Jun-2015  skrll Sync with HEAD
 1.46.2.1 06-Apr-2015  skrll Sync with HEAD
 1.52.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.52.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.52.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.54.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.55.6.2 06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1081):

sys/netinet6/udp6_usrreq.c: revision 1.143
(applied to udp6_output.c, due to refactoring in -current)

Restore the length check of a sockaddr passed from userland at udp6_output

A sockaddr with invalid length could be passed to the network stack resulting in
a kernel panic like this:

panic: sockaddr_copy: source too long, 28 < 128 bytes
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0xffffffff80216c35 cs 0x8 rflags 0x246
cr2 0x7f7ff7ef3000 ilevel 0x4 rsp 0xffff80003308b690
curlwp 0xfffffe803e11ca40 pid 48.1 lowest kstack 0xffff8000330852c0
Stopped in pid 48.1 (a.out) at netbsd:breakpoint+0x5: leave
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x140
panic() at netbsd:panic+0x3c
sockaddr_copy() at netbsd:sockaddr_copy+0x95
rtcache_setdst() at netbsd:rtcache_setdst+0x73
rtcache_lookup2() at netbsd:rtcache_lookup2+0x56
in6_selectroute() at netbsd:in6_selectroute+0x184
in6_selectsrc() at netbsd:in6_selectsrc+0x119
udp6_output() at netbsd:udp6_output+0x25e
udp6_send_wrapper() at netbsd:udp6_send_wrapper+0x8a
sosend() at netbsd:sosend+0x7bf
do_sys_sendmsg_so() at netbsd:do_sys_sendmsg_so+0x28e
do_sys_sendmsg() at netbsd:do_sys_sendmsg+0x89
sys_sendto() at netbsd:sys_sendto+0x5c
syscall() at netbsd:syscall+0x1ed
--- syscall (number 133) ---
7f7ff790173a:

Reported by Paul Ripke
 1.55.6.1 04-Nov-2018  martin Pull up following revision(s) (requested by mlelstv in ticket #1079):

sys/netinet6/udp6_usrreq.c: revision 1.142
applied to udp6_output.c (refactured in HEAD)

Fix error path in ip6 source address selection.

in6_selectsrc previously returned a pointer to an ipv6 address,
the pointer was NULL in case of an error and is checked later
instead of the also returned error code. When in6_selectsrc was
changed to store the address into a buffer, the error code
was still ignored, but the buffer pointer was never set to NULL.

As a result send() to an ipv6 address on a system that isn't
configured for ipv6 no longer returns the expected EADDRAVAIL
but fails later in ip6_output with EOPNOTSUPP when trying to
send from an unspecified address. The wrong error code caused
BIND to log the unexpected errors.
 1.3 28-Apr-2008  martin branches: 1.3.4;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
Use <net/net_stats.h> / netstat_sysctl().
 1.1 15-Apr-2008  thorpej branches: 1.1.2;
Make udp6 stats per-cpu.
 1.1.2.1 18-May-2008  yamt sync with head.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.4.1 28-Apr-2008  mjf file udp6_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:28 +0000
 1.156 08-Oct-2024  riastradh udp(4): Clarify udp4/6_espinudp and inp_overudp_cb return.

Cleanup to detect problems like this earlier:

PR kern/58688: userland panic of kernel via wg(4)
 1.155 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.154 04-Nov-2022  ozaki-r branches: 1.154.8;
inpcb: rename functions to in6pcb_*
 1.153 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.152 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.151 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.150 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.149 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.148 20-Aug-2020  riastradh branches: 1.148.2;
[ozaki-r] Changes to the kernel core for wireguard
 1.147 25-Feb-2019  maxv Fix the order in udp6_attach: soreserve should be called before
in6_pcballoc, otherwise if it fails there is still a PCB attached, and
we hit a KASSERT in socreate. In !DIAGNOSTIC this would have caused a
memory leak.

By the way I find the splsoftnet highly suspicious, in6_pcballoc already
does that.

Triggered by SyzKaller.

Reported-by: syzbot+7bace612ca3cc3e124f8@syzkaller.appspotmail.com
 1.146 28-Jan-2019  martin Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.145 27-Dec-2018  maxv Remove unused arguments.
 1.144 22-Nov-2018  knakahara Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ.

Add ATF later.
 1.143 06-Nov-2018  ozaki-r Restore the length check of a sockaddr passed from userland at udp6_output

A sockaddr with invalid length could be passed to the network stack resulting in
a kernel panic like this:

panic: sockaddr_copy: source too long, 28 < 128 bytes
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0xffffffff80216c35 cs 0x8 rflags 0x246 cr2 0x7f7ff7ef3000 ilevel 0x4 rsp 0xffff80003308b690
curlwp 0xfffffe803e11ca40 pid 48.1 lowest kstack 0xffff8000330852c0
Stopped in pid 48.1 (a.out) at netbsd:breakpoint+0x5: leave
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x140
panic() at netbsd:panic+0x3c
sockaddr_copy() at netbsd:sockaddr_copy+0x95
rtcache_setdst() at netbsd:rtcache_setdst+0x73
rtcache_lookup2() at netbsd:rtcache_lookup2+0x56
in6_selectroute() at netbsd:in6_selectroute+0x184
in6_selectsrc() at netbsd:in6_selectsrc+0x119
udp6_output() at netbsd:udp6_output+0x25e
udp6_send_wrapper() at netbsd:udp6_send_wrapper+0x8a
sosend() at netbsd:sosend+0x7bf
do_sys_sendmsg_so() at netbsd:do_sys_sendmsg_so+0x28e
do_sys_sendmsg() at netbsd:do_sys_sendmsg+0x89
sys_sendto() at netbsd:sys_sendto+0x5c
syscall() at netbsd:syscall+0x1ed
--- syscall (number 133) ---
7f7ff790173a:

Reported by Paul Ripke
 1.142 04-Nov-2018  mlelstv Fix error path in ip6 source address selection.

in6_selectsrc previously returned a pointer to an ipv6 address,
the pointer was NULL in case of an error and is checked later
instead of the also returned error code. When in6_selectsrc was
changed to store the address into a buffer, the error code
was still ignored, but the buffer pointer was never set to NULL.

As a result send() to an ipv6 address on a system that isn't
configured for ipv6 no longer returns the expected EADDRAVAIL
but fails later in ip6_output with EOPNOTSUPP when trying to
send from an unspecified address. The wrong error code caused
BIND to log the unexpected errors.
 1.141 28-Apr-2018  maxv branches: 1.141.2;
Remove unused ipsec_var.h includes.
 1.140 18-Apr-2018  maxv Remove misleading comments.
 1.139 12-Apr-2018  maxv Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.
 1.138 19-Mar-2018  roy socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.
 1.137 28-Feb-2018  maxv branches: 1.137.2;
Remove unused ipsec_private.h includes.
 1.136 28-Feb-2018  maxv Remove duplicate IPSEC_STATINC(IPSEC_STAT_IN_POLVIO), ipsec_in_reject
already increases it. IPSEC6_STATINC is now unused, so remove it too.
 1.135 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.134 08-Feb-2018  maxv Remove the IN6_IS_ADDR_V4MAPPED checks in the protocol functions. They
are useless, because the IPv6 entry point (ip6_input) already performs
them.

The checks were first added in the protocol functions:

Wed Dec 22 04:03:02 1999 UTC (18 years, 1 month ago) by itojun

"drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)"

Shortly afterwards they were also added in the IPv6 entry point, but
where not removed from the protocol functions:

Mon Jan 31 10:33:22 2000 UTC (18 years ago) by itojun

"be proactive about malicious packet on the wire. we fear that v4 mapped
address to be used as a tool to hose security filters (like bypassing
"local host only" filter by using ::ffff:127.0.0.1)."

OpenBSD did the same a few months ago. FreeBSD has never had these checks.
 1.133 08-Feb-2018  maxv pr_send can be given a NULL lwp. It looks like the

control != NULL && lwp == NULL

condition is never supposed to happen, but add a panic for safety.
 1.132 08-Feb-2018  maxv Move udp6_output() into udp6_usrreq.c, and remove udp6_output.c. This is
more consistent with IPv4, and there is no good reason for keeping a
separate file only for one function. FreeBSD did the same.
 1.131 08-Feb-2018  maxv Style, no functional change.
 1.130 06-Jul-2017  christos Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
 1.129 20-Apr-2017  ozaki-r branches: 1.129.4;
Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.128 20-Apr-2017  ozaki-r Simplify logic of udp4_sendup and udp6_sendup

They are always passed a socket with the same protocol faimiliy
as its own: AF_INET for udp4_sendup and AF_INET6 for udp6_sendup.
 1.127 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.126 18-Nov-2016  knakahara branches: 1.126.2;
fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.125 15-Nov-2016  mlelstv Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.
 1.124 15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.123 10-Jun-2016  ozaki-r branches: 1.123.2;
Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.122 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.121 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.120 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.119 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.118 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.117 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.116 30-Mar-2015  ozaki-r Tidy up opt_ipsec.h inclusions
 1.115 09-Aug-2014  rtr branches: 1.115.2; 1.115.4; 1.115.6; 1.115.10;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.114 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.113 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.112 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.111 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.110 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.109 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.108 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.107 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.106 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.105 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.104 07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.103 07-Jul-2014  rtr * have pr_stat return EOPNOTSUPP consistently for all protocols that do
not fill in struct stat instead of returning success.

* in pr_stat remove all checks for non-NULL so->so_pcb except where the
pcb is actually used (i.e. cases where we don't return EOPNOTSUPP).

proposed on tech-net@
 1.102 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.101 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.100 23-Jun-2014  rtr where appropriate rename xxx_ioctl() struct mbuf * parameters from
`control' to `ifp' after split from xxx_usrreq().

sys_socket.c
fix wrapping of arguments to be consistent with other function calls
in the file after replacing pr_usrreq() call with pr_ioctl() which
required one less argument.

link_proto.c
fix indentation of parameters in link_ioctl() prototype to be
consistent with the rest of the file.

discussed with rmind@
 1.99 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.98 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.97 22-May-2014  rmind Move udp6_input(), udp6_sendup(), udp6_realinput() and udp6_input_checksum()
from udp_usrreq.c to udp6_usrreq.c where they belong. No functional change.
 1.96 20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.95 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.94 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.93 25-Feb-2014  pooka branches: 1.93.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.92 02-Jan-2014  pooka Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.91 22-Jun-2012  christos branches: 1.91.2; 1.91.4;
PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.90 24-Sep-2011  christos branches: 1.90.2;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.89 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.88 16-Sep-2009  pooka branches: 1.88.4; 1.88.6;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.87 18-Mar-2009  cegger bzero -> memset
 1.86 04-May-2008  thorpej branches: 1.86.8; 1.86.14;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.85 28-Apr-2008  yamt udp6_init: fix a comment.
 1.84 24-Apr-2008  ad branches: 1.84.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.83 23-Apr-2008  thorpej Use <net/net_stats.h> / netstat_sysctl().
 1.82 15-Apr-2008  thorpej branches: 1.82.2;
Make udp6 stats per-cpu.
 1.81 27-Feb-2008  matt Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.80 14-Nov-2007  dyoung branches: 1.80.10; 1.80.14;
KNF. Remove superfluous parentheses. In the switch-statement,
consolidate all of the 'error = EOPNOTSUPP;' cases. No functional
change intended.
 1.79 06-Nov-2007  dyoung Take a clue from udp_usrreq(): block IPL_SOFTNET in udp6_usrreq(),
both while we purge an interface, and while we call udp6_output().

XXX udp6_usrreq() needs more attention.
 1.78 01-Nov-2007  dyoung branches: 1.78.2;
De-__P().
 1.77 04-Mar-2007  christos branches: 1.77.14; 1.77.16; 1.77.20;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.76 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.75 23-Jul-2006  ad branches: 1.75.10;
Use the LWP cached credentials where sane.
 1.74 05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.73 21-Jan-2006  rpaulo branches: 1.73.2; 1.73.4; 1.73.6; 1.73.8; 1.73.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.72 11-Dec-2005  christos branches: 1.72.2;
merge ktrace-lwp.
 1.71 15-Nov-2005  dsl Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.70 28-Aug-2005  rpaulo branches: 1.70.6;
Implement net.inet6.udp6.stats.

Reviewed by Elad Efrat.
 1.69 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.68 29-May-2005  christos branches: 1.68.2;
- avoid shadowed variables
- sprinkle const.
 1.67 11-Mar-2005  atatat Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.
 1.66 10-Mar-2005  atatat Change types of kern.file2 and net.*.*.pcblist to NODE
 1.65 09-Mar-2005  atatat Add the following nodes to the sysctl tree:

net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
 1.64 15-Dec-2004  thorpej branches: 1.64.2; 1.64.4;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.63 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.62 24-Mar-2004  atatat branches: 1.62.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.61 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.60 25-Oct-2003  christos fix uninitialized variables
 1.59 06-Sep-2003  itojun clarify flowlabel handling
 1.58 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.57 22-Aug-2003  itojun no need for opt_ipsec.h any longer
 1.56 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.55 29-Jun-2003  fvdl branches: 1.55.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.54 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.53 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.52 12-May-2002  matt Eliminate commons.
 1.51 13-Nov-2001  lukem add RCSIDs
 1.50 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.49 24-Oct-2001  itojun more whitespace sync with kame
 1.48 24-Oct-2001  itojun remove unused codepath (unifdef -UUDP6)
 1.47 18-Oct-2001  itojun branches: 1.47.2;
reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.46 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.45 25-Jul-2001  itojun allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.44 03-Jul-2001  itojun branches: 1.44.2;
call in{,6}_pcbpurgeif0() before in{,6}_purgeif().
 1.43 27-May-2001  itojun remove debug printfs, which can be too noisy. sync with kame.
 1.42 24-May-2001  itojun call icmp6_mtudisc_update(foo, 0) even if ICMPv6 messages are very short.
let icmp6 layer decide whether we take PMTUD routes or not.
 1.41 08-May-2001  itojun correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.40 11-Feb-2001  itojun branches: 1.40.2;
pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.39 10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.38 08-Feb-2001  itojun move udp6_output() to separate file. (sync better with kame)
 1.37 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.36 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.35 06-Nov-2000  itojun fix IPv4 TTL selection with AF_INET6 API. sync with kame. From: jdc
 1.34 19-Oct-2000  itojun validate ICMPv6 too big message.
XXX too restrictive given frequent uses of sendto(2)
 1.33 13-Oct-2000  itojun validate mbuf chain length on *_ctlinput. remote node may be able to
transmit a truncated icmp6 packet and panic the system. sync with kame.
 1.32 07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.31 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.30 08-Jun-2000  itojun branches: 1.30.2;
make sure not to overwrite sockaddr on PRU_SEND/PRU_CONNECT to
link-local address. From: frank
 1.29 05-Jun-2000  itojun pass struct proc * down to udp6_output and in6_pcbbind.
 1.28 22-May-2000  itojun branches: 1.28.2;
make net.inet6.udp6.* sysctl name meet with IPv4 counterpart.
XXX do we need to keep symbols mistakingly added (like UDP6CTL_SENDMAX)
for backward compatibility? I believe not.
 1.27 17-Apr-2000  itojun fix endian issue in filling in interface id.
better sync with kame (nuke space at EOL).
 1.26 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.25 28-Feb-2000  itojun make ICMPv6 redirect actually flush route cache in udp6/raw6 socket.
 1.24 25-Feb-2000  itojun remove extra NULL check
typo in PULLDOWN_TEST case
note: the fixes does not affect normal configuration.
(sync with kame)
 1.23 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.22 06-Feb-2000  itojun don't chase mbuf pointer when it is NULL.
 1.21 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.20 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.19 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.18 31-Jan-2000  itojun destination port == 0 is illegal based on RFC768.
(NetBSD PR: 9137 - I thought I committed this already but I wasn't)
 1.17 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.16 22-Dec-1999  itojun drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)
 1.15 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.14 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.13 13-Sep-1999  itojun branches: 1.13.2; 1.13.8;
- Call in{,6}_pcbdetach if ipsec initialization is failed during PRU_ATTACH.
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.

(sync with recent KAME)
 1.12 25-Aug-1999  itojun ctlinput handling must look at ip6_src, not ip6_dst.
(this makes path mtu handling wrong)
 1.11 13-Aug-1999  itojun fix multicast demux.

From: Matthias Drochner <drochner@zel459.zel.kfa-juelich.de>
 1.10 09-Aug-1999  itojun return with doing nothing from xx_ctlinput(), when sa->sa_family
is not the expected one.

I see PRC_REDIRECT_HOST with sa->sa_family == AF_UNIX coming to
{tcp,udp}_ctlinput() when I use dhclient, and I feel like adding
more sanity checks, without logging - if we log it it is too noisy.
 1.9 09-Aug-1999  itojun log() needs "\n" at the end.
 1.8 05-Aug-1999  itojun import recent kAME fixes.
- initialize hoplimit for raw6 socket properly.
- respect SO_TIMESTAMP on udp6.
- more sanity checks.
 1.7 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.6 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.5 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.4 04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file udp6_usrreq.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file udp6_usrreq.c was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.13.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.13.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.13.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.13.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.30.2.4 09-May-2001  he Pull up revision 1.41 (via patch, requested by itojun):
Correct faith prefix determintaion.
 1.30.2.3 06-Apr-2001  he Pull up revision 1.37 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.30.2.2 10-Nov-2000  tv Pullup 1.35 [itojun]:
fix IPv4 TTL selection with AF_INET6 API. sync with kame. From: jdc
 1.30.2.1 17-Oct-2000  tv Pullup 1.33 [itojun]:
validate mbuf chain length on *_ctlinput. remote node may be able to
transmit a truncated icmp6 packet and panic the system. sync with kame.
 1.40.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.40.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.40.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.40.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.40.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.40.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.44.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.44.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.44.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.44.2.1 03-Aug-2001  lukem update to -current
 1.47.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.2.9 11-Dec-2005  christos Sync with head.
 1.55.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.55.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.55.2.6 18-Dec-2004  skrll Sync with HEAD.
 1.55.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.55.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.55.2.3 12-Aug-2004  skrll Sync with HEAD.
 1.55.2.2 03-Aug-2004  skrll Sync with HEAD
 1.55.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.62.2.1 28-May-2004  tron Pull up revision 1.63 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.64.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.64.2.1 29-Apr-2005  kent sync with -current
 1.68.2.6 17-Mar-2008  yamt sync with head.
 1.68.2.5 15-Nov-2007  yamt sync with head.
 1.68.2.4 03-Sep-2007  yamt sync with head.
 1.68.2.3 26-Feb-2007  yamt sync with head.
 1.68.2.2 30-Dec-2006  yamt sync with head.
 1.68.2.1 21-Jun-2006  yamt sync with head.
 1.70.6.1 22-Nov-2005  yamt sync with head.
 1.72.2.1 01-Feb-2006  yamt sync with head.
 1.73.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.73.8.1 11-May-2006  elad sync with head
 1.73.6.2 11-Aug-2006  yamt sync with head
 1.73.6.1 24-May-2006  yamt sync with head.
 1.73.4.1 01-Jun-2006  kardel Sync with head.
 1.73.2.3 09-Sep-2006  rpaulo sync with head
 1.73.2.2 14-Feb-2006  rpaulo in6pcb -> inpcb.
 1.73.2.1 07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.75.10.2 12-Mar-2007  rmind Sync with HEAD.
 1.75.10.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.77.20.2 18-Nov-2007  bouyer Sync with HEAD
 1.77.20.1 13-Nov-2007  bouyer Sync with HEAD
 1.77.16.4 23-Mar-2008  matt sync with HEAD
 1.77.16.3 09-Jan-2008  matt sync with HEAD
 1.77.16.2 08-Nov-2007  matt sync with -HEAD
 1.77.16.1 06-Nov-2007  matt sync with HEAD
 1.77.14.3 21-Nov-2007  joerg Sync with HEAD.
 1.77.14.2 11-Nov-2007  joerg Sync with HEAD.
 1.77.14.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.78.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.80.14.2 02-Jun-2008  mjf Sync with HEAD.
 1.80.14.1 03-Apr-2008  mjf Sync with HEAD.
 1.80.10.1 24-Mar-2008  keiichi sync with head.
 1.82.2.1 18-May-2008  yamt sync with head.
 1.84.2.3 11-Mar-2010  yamt sync with head
 1.84.2.2 04-May-2009  yamt sync with head.
 1.84.2.1 16-May-2008  yamt sync with head.
 1.86.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.86.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.88.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.88.4.1 31-May-2011  rmind sync with head
 1.90.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.90.2.1 30-Oct-2012  yamt sync with head
 1.91.4.3 18-May-2014  rmind sync with head
 1.91.4.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.91.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.91.2.2 03-Dec-2017  jdolecek update from HEAD
 1.91.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.93.2.1 10-Aug-2014  tls Rebase.
 1.115.10.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.115.6.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.115.4.9 28-Aug-2017  skrll Sync with HEAD
 1.115.4.8 05-Feb-2017  skrll Sync with HEAD
 1.115.4.7 05-Dec-2016  skrll Sync with HEAD
 1.115.4.6 05-Oct-2016  skrll Sync with HEAD
 1.115.4.5 09-Jul-2016  skrll Sync with HEAD
 1.115.4.4 29-May-2016  skrll Sync with HEAD
 1.115.4.3 22-Sep-2015  skrll Sync with HEAD
 1.115.4.2 06-Jun-2015  skrll Sync with HEAD
 1.115.4.1 06-Apr-2015  skrll Sync with HEAD
 1.115.2.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.123.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.123.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.123.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.123.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.126.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.129.4.3 15-Jul-2019  martin Pull up following revision(s) (requested by maxv in ticket #1288):

sys/netinet6/udp6_usrreq.c: revision 1.147

Fix the order in udp6_attach: soreserve should be called before
in6_pcballoc, otherwise if it fails there is still a PCB attached, and
we hit a KASSERT in socreate. In !DIAGNOSTIC this would have caused a
memory leak.

By the way I find the splsoftnet highly suspicious, in6_pcballoc already
does that.
 1.129.4.2 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1175):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/sctp_usrreq.c 1.14
sys/netinet/tcp_usrreq.c 1.223
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/sctp6_usrreq.c 1.17
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.129.4.1 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.137.2.6 18-Jan-2019  pgoyette Synch with HEAD
 1.137.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.137.2.4 02-May-2018  pgoyette Synch with HEAD
 1.137.2.3 22-Apr-2018  pgoyette Sync with HEAD
 1.137.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.137.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.141.2.1 10-Jun-2019  christos Sync with HEAD
 1.148.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.154.8.1 02-Aug-2025  perseant Sync with HEAD
 1.31 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.30 22-Nov-2018  knakahara Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ.

Add ATF later.
 1.29 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.28 02-May-2015  rtr branches: 1.28.16; 1.28.18;
make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.27 22-May-2014  rmind branches: 1.27.4;
Move udp6_input(), udp6_sendup(), udp6_realinput() and udp6_input_checksum()
from udp_usrreq.c to udp6_usrreq.c where they belong. No functional change.
 1.26 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.25 22-Jun-2012  christos branches: 1.25.2; 1.25.4; 1.25.12;
PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.24 24-Sep-2011  christos branches: 1.24.2;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.23 24-Apr-2008  ad Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.22 15-Apr-2008  thorpej branches: 1.22.2;
Make udp6 stats per-cpu.
 1.21 17-Feb-2007  dyoung branches: 1.21.38;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.20 23-Jul-2006  ad branches: 1.20.10;
Use the LWP cached credentials where sane.
 1.19 11-Dec-2005  christos branches: 1.19.4; 1.19.8;
merge ktrace-lwp.
 1.18 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.17 28-Aug-2005  rpaulo Implement net.inet6.udp6.stats.

Reviewed by Elad Efrat.
 1.16 15-Dec-2004  thorpej branches: 1.16.10;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.15 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.14 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 29-Jun-2003  fvdl branches: 1.13.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.12 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.11 12-May-2002  matt Eliminate commons.
 1.10 15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.9 05-Jun-2000  itojun branches: 1.9.4; 1.9.6;
pass struct proc * down to udp6_output and in6_pcbbind.
 1.8 22-May-2000  itojun branches: 1.8.2;
make net.inet6.udp6.* sysctl name meet with IPv4 counterpart.
XXX do we need to keep symbols mistakingly added (like UDP6CTL_SENDMAX)
for backward compatibility? I believe not.
 1.7 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.6 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.5 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.4 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.3 03-Jul-1999  thorpej branches: 1.3.2; 1.3.8;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file udp6_var.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file udp6_var.h was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.3.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.9.6.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.9.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.4.2 20-Jun-2002  nathanw Catch up to -current.
 1.9.4.1 22-Oct-2001  nathanw Catch up to -current.
 1.13.2.7 11-Dec-2005  christos Sync with head.
 1.13.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.13.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.13.2.2 03-Aug-2004  skrll Sync with HEAD
 1.13.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.16.10.3 26-Feb-2007  yamt sync with head.
 1.16.10.2 30-Dec-2006  yamt sync with head.
 1.16.10.1 21-Jun-2006  yamt sync with head.
 1.19.8.1 11-Aug-2006  yamt sync with head
 1.19.4.2 09-Sep-2006  rpaulo sync with head
 1.19.4.1 07-Feb-2006  rpaulo in6pcb -> inpcb.
 1.20.10.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.21.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.22.2.1 18-May-2008  yamt sync with head.
 1.24.2.1 30-Oct-2012  yamt sync with head
 1.25.12.1 10-Aug-2014  tls Rebase.
 1.25.4.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.25.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.25.2.2 03-Dec-2017  jdolecek update from HEAD
 1.25.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.27.4.1 06-Jun-2015  skrll Sync with HEAD
 1.28.18.1 10-Jun-2019  christos Sync with HEAD
 1.28.16.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.28.16.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)

RSS XML Feed